I had an interesting discussion with a co-worker yesterday, we were discussing how to grab all the comments from an Objective-C source file. We tried to think about how to do that with a regex, which probably would have worked ok, but definitely wouldn’t have handled all the edge cases easily.
I’ve been using clang’s awesome libTooling library for writing frontend actions recently so I thought I’d try and take a quick pass at writing a frontend action that can grab all the comments from a source file.
Setting up the envionment for building frontend actions is covered in the awesome clang Tutorial for building tools using LibTooling and LibASTMatchers documentation, so go do that if you haven’t already!
Next, I set up a new frontend action. I cloned down llvm into the ~/clang-llvm
folder:
cd ~/clang-llvm/llvm/tools/clang
mkdir tools/extra/commentparser
echo 'add_subdirectory(commentparser)' >> tools/extra/CMakeLists.txt
vim tools/extra/commentparser/CMakeLists.txt
Then the CMakeLists.txt
should look like this:
set(LLVM_LINK_COMPONENTS support)
add_clang_executable(commentparser
CommentParser.cpp
)
target_link_libraries(commentparser
clangTooling
clangBasic
clangASTMatchers
)
Now we can start writing our fontend action. We need a couple of different elements: setting up command line parsing in the main function, an ASTFrontendAction
and an ASTConsumer
. Starting with the command line parsing and setup:
static llvm::cl::OptionCategory MyToolCategory("My tool options");
int main(int argc, const char **argv) {
CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
ClangTool Tool(OptionsParser.getCompilations(),
OptionsParser.getSourcePathList());
return Tool.run(newFrontendActionFactory<FindCommentsAction>().get());
}
Nothing too interesting going on here yet, although note that we setup a new OptionCategory
, this lets us hide all the default clang options displayed in the help, we don’t want those for our tool.
Next is the ASTFrontendAction
, all we are going to do here is create our ASTConsumer
and forward the compiler’s ASTContext
to it. Here’s that code:
class FindCommentsAction : public clang::ASTFrontendAction {
public:
virtual std::unique_ptr<clang::ASTConsumer> CreateASTConsumer(
clang::CompilerInstance &Compiler, llvm::StringRef InFile) {
return std::unique_ptr<clang::ASTConsumer>(
new FindCommentsConsumer(&Compiler.getASTContext()));
}
};
Now for the interesting part, our ASTConsumer
, this is the class that we can visit AST nodes in. If we wanted to, we could perform an action on all IfStmt
, ForLoop
, EnumDecl
, etc. All we need to visit to handle comments is the translation unit. A TranslationUnitDecl
is always the top level node in the Abstract Syntax Tree1
class FindCommentsConsumer : public clang::ASTConsumer {
public:
explicit FindCommentsConsumer(ASTContext *Context) {}
// This is called after the full translation unit is parsed
virtual void HandleTranslationUnit(clang::ASTContext &Context) {
auto comments = Context.getRawCommentList().getComments();
for (auto comment : comments) {
std::cout << comment->getRawText(Context.getSourceManager()).str() << std::endl;
}
std::cout << "Finished parsing for comments" << std::endl;
}
};
That’s all we need for this frontend action, we are just going to print out the comments that we find in each translation unit.
For testing, I’m going to use a simple source file test.cpp
:
/** Other comment */
// This is the main method
/// Documentation comment
/* block comment */
int f(int x) {
int result = (x / 42); // end of line comment
return result;
}
/** this is a block comment */
int main(int argc, char** argv) {
return 0;
}
Compiling and running the frontend action:
ninja commentparser
./commentparser test.cpp --
/** Other comment */
/// Documentation comment
/** this is a block comment */
Finished parsing for comments
Wait, what happened to the other comments in the file? Clang treats some comments differently to others. Comments that start with /**
or ///
are treated as documentation comments. By default, clang only
parses documentation comments.2
Luckily, there’s a command line flag to override this and parse all comments -fparse-all-comments
:
./commentparser test.cpp -- -fparse-all-comments
/** Other comment */
// This is the main method
/// Documentation comment
/* block comment */
// end of line comment
/** this is a block comment */
Finished parsing for comments
That gives us all the comments in the translation unit! Note that if we wanted to compile a more complicated class with imports or a project that has multiple files, we’d have to use a compilation database3. Checkout out the full source code for this frontend action CommentParser.cpp