葛文馨 魏永山



摘? 要: 為了檢測出數據庫實驗課程中結構化查詢語言(Structured Query Language, SQL)語句代碼的抄襲行為,發現因SQL語句篇幅較短,使用現有的技術進行代碼檢測卻沒有得到預期的結果,于是提出了一種基于編碼習慣的SQL語句抄襲檢測算法。獲取學生歷史的編碼數據并分類,判定待檢測代碼的類別并將其與之類別相同的代碼依照學生的編碼習慣進行特征提取,進而得到特征矩陣并對比代碼之間的相似程度,對涉嫌抄襲的代碼進行過濾,判斷該代碼是否為該學生編寫。實驗結果表明,該算法能夠有效地判斷出學生的抄襲行為,同時也解決因編碼篇幅較短而帶來的難以檢測是否為抄襲代碼這一問題。
關鍵詞: 編碼習慣;代碼抄襲檢測;樸素貝葉斯;SQL
【Abstract】: In order to detect the plagiarism of the SQL statement code in the database experiment course, it is found that the SQL statement is short, and the existing technology is used to detect the code but the expected result is not obtained. Therefore, a SQL statement plagiarism detection based on coding habit is proposed. Obtaining the classification data of the student history and classifying it, determining the category of the code to be detected and extracting the code with the same category according to the coding behavior of the student, thereby obtaining the feature matrix and comparing the degree of similarity between the codes, for the suspected plagiarism. The code is filtered to determine if the code was written for the student. The experimental results show that the algorithm can effectively judge the plagiarism of students, and also solve the problem that it is difficult to detect whether it is plagiarism code due to the short length of coding.
【Key words】: Coding behavior; Code plagiarism detection; Naive bayesian; SQL
0? 引言
隨著計算機技術的蓬勃發展,當今的教育模式也繼而發生著巨大的改變。現如今的教育模式已經不僅僅是過去那種傳統又單一的書本教育,而是逐漸發展成書本、互聯網等多元化的形式,在線教育也應運而生。學生可以在課程輔助網站上進行相應課程的練習,以鞏固提高在課堂上學習到的知識點。學生在網站上進行練習時會遇到需要編寫代碼才能完成的題目,該類題目設計的初衷是為了幫助學生掌握相關知識點,提高學生的編碼技能以及熟練程度。然而,隨著代碼共享以及復制粘貼的方法越來越容易,初衷卻越來越難以實現,反而抄襲的現象日益劇增,尤其是在學生和老師缺乏直接聯系的在線教育網站中,畢竟在網絡教育中難以對學生的做題過程進行監督,于是很難確定學生是否掌握此類知識,能否熟練準確地編寫出相應的代碼。……