Don’t you think it would be good, if some tool could take code snippet as input query and returns a set of recommended code snippets from large code bases ? yes there is one. Facebook has just released Aroma which is code-to-code search and recommendation tool and enables developers to get insights from large code bases.
In this post, we take look at key features of Aroma code-to-code search and recommendation tool
- Aroma is a tool for code recommendation via structural code search. It indexes a large code bases that could include thousands of open-source projects.
- Aroma takes a partial code snippet as input,searches the indexed method bodies, clusters and intersects the results of search to recommend a small set of code snippets which contain the query snippet and which appears as part of several programs from the code bases.
- Aroma indexes the code corpus as a sparse matrix in two steps:
- Featurization – Parses each method in the corpus and creates its parse tree. Then it extracts a set of
structural features from the parse tree of each method.
- Vectorization – Feature vectors of all method bodies are represented as a sparse matrix whose jth row represents the feature vector of the jth method body in the corpus.
- In the recommendation stage, given a query code snippet, Aroma runs the following phases to create recommendations:
- Light-weight search: As 1st step,it featurizes the query code’s parse tree into a sparse vector and then takes the dot product of this vector with the feature matrix. The top n1 method bodies whose dot products are highest are retrieved as the candidate set for recommendation.
- Prune and ReRank: Aroma then ranks the retrieved code snippets from the previous phase based on their similarity to the query snippet.
- Code snippet recommended by Aroma is generated from several similar looking code snippets via intersection hence it increases the likelihood of appropriateness.
- Aroma does NOT require mining common coding patterns or idioms ahead of time hence It is not limited to a set of mined patterns it can retrieve new and interesting code snippets on-the-fly.
- Aroma is fast enough to use in real time. It first retrieves a small set of snippets based on approximate search, and then performs the heavy-duty pruning and clustering operations on this set. This enables to create recommended code snippets on a given query from a large code base containing millions of methods within a couple of seconds.
- Though Aroma is code recommendation engine,it could also be used to perform efficient and precise code-to-code structural search.
- On average, Aroma takes 1.6 seconds to create recommendations for a query code snippet. It has been evaluated using code snippets obtained from 5,417 GitHub Java Android projects and code snippets from Stack Overflow.
More details about Aroma here
- Like this post? Don’t forget to share it!