I had an interesting discussion with a co-worker yesterday, we were discussing how to grab all the comments from an Objective-C source file. We tried to think about how to do that with a regex, which probably would have worked ok, but definitely wouldn’t have handled all the edge cases easily.

I’ve been using clang’s awesome libTooling library for writing frontend actions recently so I thought I’d try and take a quick pass at writing a frontend action that can grab all the comments from a source file.

Setting up the envionment for building frontend actions is covered in the awesome clang Tutorial for building tools using LibTooling and LibASTMatchers documentation, so go do that if you haven’t already!

Next, I set up a new frontend action. I cloned down llvm into the ~/clang-llvm folder:

cd ~/clang-llvm/llvm/tools/clang
mkdir tools/extra/commentparser
echo 'add_subdirectory(commentparser)' >> tools/extra/CMakeLists.txt
vim tools/extra/commentparser/CMakeLists.txt

Then the CMakeLists.txt should look like this:

set(LLVM_LINK_COMPONENTS support)

add_clang_executable(commentparser
  CommentParser.cpp
)
target_link_libraries(commentparser
  clangTooling
  clangBasic
  clangASTMatchers
)

Now we can start writing our fontend action. We need a couple of different elements: setting up command line parsing in the main function, an ASTFrontendAction and an ASTConsumer. Starting with the command line parsing and setup:

static llvm::cl::OptionCategory MyToolCategory("My tool options");
int main(int argc, const char **argv) {
  CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
  ClangTool Tool(OptionsParser.getCompilations(),
                 OptionsParser.getSourcePathList());
  return Tool.run(newFrontendActionFactory<FindCommentsAction>().get());
}

Nothing too interesting going on here yet, although note that we setup a new OptionCategory, this lets us hide all the default clang options displayed in the help, we don’t want those for our tool.

Next is the ASTFrontendAction, all we are going to do here is create our ASTConsumer and forward the compiler’s ASTContext to it. Here’s that code:

class FindCommentsAction : public clang::ASTFrontendAction {
public:
  virtual std::unique_ptr<clang::ASTConsumer> CreateASTConsumer(
    clang::CompilerInstance &Compiler, llvm::StringRef InFile) {
    return std::unique_ptr<clang::ASTConsumer>(
        new FindCommentsConsumer(&Compiler.getASTContext()));
  }
};

Now for the interesting part, our ASTConsumer, this is the class that we can visit AST nodes in. If we wanted to, we could perform an action on all IfStmt, ForLoop, EnumDecl, etc. All we need to visit to handle comments is the translation unit. A TranslationUnitDecl is always the top level node in the Abstract Syntax Tree1

class FindCommentsConsumer : public clang::ASTConsumer {
public:
  explicit FindCommentsConsumer(ASTContext *Context) {}

  // This is called after the full translation unit is parsed
  virtual void HandleTranslationUnit(clang::ASTContext &Context) {
    auto comments = Context.getRawCommentList().getComments();
    for (auto comment : comments) {
      std::cout << comment->getRawText(Context.getSourceManager()).str() << std::endl;
    }
    std::cout << "Finished parsing for comments" << std::endl;
  }
};

That’s all we need for this frontend action, we are just going to print out the comments that we find in each translation unit.

For testing, I’m going to use a simple source file test.cpp:

/** Other comment */
// This is the main method
/// Documentation comment
/* block comment */

int f(int x) {
  int result = (x / 42); // end of line comment
  return result;
}
/** this is a block comment */

int main(int argc, char** argv) {
  return 0;
}

Compiling and running the frontend action:

ninja commentparser
./commentparser test.cpp --
/** Other comment */
/// Documentation comment
/** this is a block comment */
Finished parsing for comments

Wait, what happened to the other comments in the file? Clang treats some comments differently to others. Comments that start with /** or /// are treated as documentation comments. By default, clang only parses documentation comments.2

Luckily, there’s a command line flag to override this and parse all comments -fparse-all-comments:

./commentparser test.cpp -- -fparse-all-comments

/** Other comment */
// This is the main method
/// Documentation comment
/* block comment */
// end of line comment
/** this is a block comment */
Finished parsing for comments

That gives us all the comments in the translation unit! Note that if we wanted to compile a more complicated class with imports or a project that has multiple files, we’d have to use a compilation database3. Checkout out the full source code for this frontend action CommentParser.cpp