Authorship identification is an essential technique of identifying contributors of source code to facilitate the work of different security-related applications. It has been widely used in different code analysis applications and security-related tasks such as detecting ghostwriting, resolving copyright disputes, etc. For several decades it has become an emerging research topic in the field of plagiarism detection. Most of the identification methods consider code samples of a single author. However, the development of real-world software needs a team effort. As a result, multiple authors identification of source code is still a great research challenge. To overcome the challenge, we propose a Deep Learning approach to identify multiple authors of source codes by analyzing the coding style of authors. In this method, we use several machine learning and deep learning-based algorithm i.e. Deep Neural Network(DNN), Support Vector Machine(SVM), and Long short-term memory(LSTM) for detecting multiple authors from source code. The code2seq model over Abstract Syntax Tree is used to identify the structural representation of source code which helps to achieve better performance of the algorithms. The evaluation result of our model shows that it gains remarkable accuracy in identifying multiple contributors of source codes. The deep neural network model outperforms the other algorithms by achieving 96.7 % classification accuracy.
Stars
2
Forks
0
Watchers
2
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
3
commits