NAME
srcml - Conversion of source code to/from the srcML format, querying
and manipulation of srcML
SYNOPSIS
srcml [general-options] [srcML-options] [transformations] [out-
put-src-options] [input] [output]
DESCRIPTION
The program srcml supports the srcML format. The srcML format presents
an XML view of source code for addressing, querying, and transforma-
tion. The tool translates source code into the XML source-code repre-
sentation srcML, where standard XML tools, available directly from the
srcml program, can query and transform the srcML. The tool can then
convert the srcML back to source code.
The srcML format preserves all text of the source code, including
white-space, comments, and preprocessor statements. The C Preprocessor
is not run on the source code. The program works on large projects,
individual source-code files, or code fragments, including individual
statements.
The conversion to the srcML format uses a custom parser that is fast
and tolerant to incomplete source code and uncompilable code.
Use of the character '-' in the place of an input file, or providing no
input file, implies reading from standard input. A source-code language
must be specified when source-code input is from standard input.
GENERAL OPTIONS
-h, --help
Output the help and exit.
-V, --version
Output the version of srcml then exit.
-v, --verbose
Conversion and status information to stderr, including encodings
used. Especially useful with for monitoring progress of the
option --files-from, a directory, or source-code archive (e.g.,
tar.gz).
-q, --quiet
Suppresses status messages.
-o file, --output=file
Write the output to file. By default, it writes to standard out-
put.
-j num, --jobs=num
Allow up to num threads for source parsing. Default is 4.
Treat the input file as a list of source files. Each file is
separately translated and collectively stored into a single
srcML archive. The list has a single filename on each line.
Ignored lines include blank lines and lines that begin with the
character '#'. As with input and output files, using the charac-
ter '-' in place of a file name takes the input list from stan-
dard input.
-l language, --language=language
Set the programming language of the input source code. Allowable
values are C, C++, C#, and Java. The language affects parsing,
the allowed markup, and what is considered a keyword. The value
is also stored individually as an attribute in each unit. If the
input is a directory or source-code archive (e.g., .tar.gz,
.zip), the language only applies to files with source-code
extensions. Use --register-ext to register non-standard
source-code extensions.
If not specified, the programming language is based on the file exten-
sion. Language must be specified if using standard input. If the file
extension is not available or not in the standard list, then the pro-
gram will skip that file. This allows you to run srcml on a project
directory with source and non-source files, where srcml only parses
files with supported extensions.
--register-ext extension=language
Set the file extension map to a given language. Note that the
extensions do not contain the period character '.', e.g., --reg-
ister-ext "h=C++"
A common use is C++ files that use the .h extension for header files.
By default, these are processed as C source-code files. This option can
be used to override this behavior.
--src-encoding=encoding
Use the encoding when processing the input source-code file. The
default is to try to automatically determine this when possible,
i.e., ISO-8859-1 is assumed unless a non-character is detected.
Encodings include "UTF-16", "ISO-10646-UCS-2", and "ISO-8859-1".
On UNIX platforms, a full list of encodings can be obtained by
using the command iconv -l.
--eol=eol
Use the eol for output of source code. Allowable values are the
default auto, 'UNIX' or linefeed lf, carriage return cr, and
'Windows' or carriage return, linefeed crlf. In most cases the
default auto is sufficient.
-r, --archive
Create a srcML archive, which can contain multiple files in the
srcML format. Default when provided more than one file or a
directory as input.
-output-srcml-inner
Output the XML inside of the srcML unit element. This is not
valid XML as it contains no namespace declarations and does not
necessarily have a single root element.
Examples
srcml --text="a;" -l C++ --output-srcml-outer
<unit revision="1.0.0" lan-
guage="C++"><expr_stmt><expr><name>a</name></expr>;</expr_stmt></unit>
srcml --text="a;" -l C++ --output-srcml-inner
<expr_stmt><expr><name>a</name></expr>;</expr_stmt>
MARKUP OPTIONS
Optional line and column attributes are used to indicate the position
of an element in the original source code. Both the line and column
start at 1. The column position is based on the tab settings with a
default tab size of 8. Other tab sizes can be set using the tabs.
--position
Insert attributes for the start (line and column) and end (line
and column) of an element in the start tag. These attributes
have a default prefix of "pos" in the namespace
"http://www.srcML.org/srcML/position", e.g., <class
pos:start="15,1" pos:end="25,2">
--tabs=tabsize
Set the tab size. Default is 8. Use of this option automatically
turns on the position attributes.
This set of options allows control over how preprocessing regions are
handled, i.e., whether parsing and markup occur. In all cases the
source is preserved.
--cpp Turn on parsing and markup of preprocessor statements in
non-C/C++ languages such as Java. Can also be enabled by defin-
ing a prefix for this cpp namespace URL, e.g.,
--xmlns:cpp="http://www.srcML.org/srcML/cpp".
--cpp-markup-if0
Markup #if 0 regions. The default is to preserve the source code
in these regions, without any markup. This option indicates that
the #if 0 regions should be treated as source code, and marked
up accordingly.
--cpp-no-markup-else
Only place source code in #else and #elif regions, leaving out
markup. The default is to markup these regions.
XML FORMAT
the srcML document by the declaration of the specific extension names-
pace. These flags make it easier to declare, and are an alternative way
to turn on options by declaring the URL for an option.
--xmlns=url
Set the url for the default namespace. The predefined URL is:
--xmlns=http://www.srcML.org/srcML/src
--xmlns:prefix=url
Set the namespace prefix PREFIX for the namespace URL. There is
a set of standard URLs for the elements in srcML, each with a
predefined prefix. The predefined URLs and prefixes are:
--xmlns:cpp=http://www.srcML.org/srcML/cpp
--xmlns:pos=http://www.srcML.org/srcML/position
METADATA OPTIONS
This set of options allows view and control over various metadata in
srcML.
The following options allow viewing various metadata stored in the
srcML document.
-L, --list
List all the files in the srcML archive, then exit. archive,
then exit.
-i, --info
Display most metadata, except the unit count (file count) in a
srcML archive, then exit.
-I, --full-info
Display most metadata including the unit (file) count in a srcML
archive, then exit.
--show-language
Display language and exit.
--show-url
Display URL of the root element and exit.
--show-filename
Display the filename and exit.
--show-src-version
Display the source-code version attribute and exit.
--show-timestamp
Display the timestamp attribute and exit.
--show-hash
The value of the filename attribute is typically obtained from
the input filename. This option allows you to specify a differ-
ent filename for standard input or where the filename is not
contained in the input path.
--url=url
The url attribute on the root element can be defined. This is
purely descriptive and has no interpretation by srcml. It is
useful for specifying a directory or defining the source proto-
col.
-s version, --src-version=version
Set the value of the attribute version to version. This is a
purely-descriptive attribute, where the value has no interpreta-
tion by srcml. The attribute is applied to the root element, and
in the case of a srcML archive, it is also applied to each unit
in the archive.
--hash The value of the hash attribute is a SHA-1 hash generated based
on the contents of the source-code file. This is enabled by
default when working with srcML archives.
--timestamp
Set the timestamp of the output srcML file to the last modified
time of the input source-code archive. This is the last modified
time based on the archive files.
EXAMPLES
srcml input.cpp
Create a srcML unit from input.cpp, using C++ parsing rules, and
output to standard out.
echo "int a;" | srcml -l C++
Create a srcML unit from standard input, using C++ parsing
rules, and output to standard out.
srcml --text="int a;\n" -l C++
Create a srcML unit from the expanded text, using C++ parsing
rules, and output to standard out.
srcml dir.xml --show-unit-count
Create a srcML archive from all files contained in the dir
directory, using their extensions to determine the markup pars-
ing rules, and output the number of units contained in the ar-
chive to standard out.
srcml input.java --cpp
Create a srcML unit from input.java, using Java parsing rules as
well as C++ parsing rules for preprocessor directives.
EXTRACTING SOURCE CODE
The following describe options that are only applicable for when the
srcml dir/ -o dir.xml
Create a srcML archive from all files contained in the dir
directory, using their extensions to determine the markup pars-
ing rules, and write the resulting srcML archive to dir.xml.
srcml archive.xml --to-dir=.
Re-create all files based on the srcML units in archive.xml,
using the current directory as the root directory.
TRANSFORMATIONS
--xpath=<expression>
Query each individual unit using the Xpath expression.
The default prefix cannot be used in Xpath expressions. Element names
must have a prefix, e.g., src, cpp, etc. If path from the root is not
given, i.e., '//...' or '/src:unit/..', context is assumed to the '//',
e.g., 'src:name' is the same as '//src:name', and 'count(src:name)' is
the same as 'count(//src:name)'.
By default, the result is a srcML archive where each unit is a query
result, marked with the original filename. As an alternative, the orig-
inal srcML can be preserved with the query results marked with an
attribute, wrapped with an element, or both. Note that the prefix and
url used for the namespace must be declared with the option --xmlns:.
--attribute prefix:name=value
Add the attribute prefix:name="value" to every Xpath expression
result.
--element prefix:name
Wrap every Xpath expression result with an element of the form
prefix:name. May be mixed with --attribute.
--xslt file|url
Apply a transformation from an XSLT file or url to each individ-
ual unit.
--xslt-param name="value"
Pass the string parameter name with UTF-8 encoded string value
to the XSLT program
--relaxng=file|url
Output individual units that match the RELAXNG file or url.
EXAMPLES
srcml a.cpp --xpath="//src:name" --attribute="q:foo=test"
--xmlns:q=mysite.net
Convert a.cpp to srcML and add the attribute q:foo=test to all
src:name elements as found by the XPath query. Output the
results to standard out.
srcml archive.xml --xpath "//src:unit/@filename"
ble. For non-CFG languages, i.e., C/C++, and with macros this may lead
to incorrect markup.
Line endings are normalized in XML formats including srcML.
BUGS
Libxml2 directly supports many encodings beyond UTF-8, UTF-16, and
ISO-8859-1 through iconv. However, the BOM (Byte Order Mark) immedi-
ately before the XML declaration may not be processed correctly by
srcml and by other libxml2-based tools (e.g., xmllint). Use the LE or
BE version of the encoding, e.g., UTF-32BE, UTF-32LE, instead.
Report bugs at Contact us at http://www.srcml.org/support.html
AUTHORS
Written by Michael L. Collard, Michael Decker, Drew Guarnera, Brian
Bartman, and Heather Michaud.
COPYRIGHT
Copyright (C) 2013-2019 srcML, LLC. (www.srcML.org)
The srcML Toolkit is free software; you can redistribute it and/or mod-
ify it under the terms of the GNU General Public License as published
by the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
The srcML Toolkit is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of MER-
CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with the srcML Toolkit; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
December 2019 SRCML(1)
Man(1) output converted with
man2html