Hu Xinying, He Yu, Sun Guangzhong
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
Abstract: A new cognitive diagnostic framework was proposed to evaluate students’ theoretical and practical abilities in computer science education. Based on the probability graph model, students’ coding ability was introduced, then the students’ theoretical and practical abilities was modeled. And a parallel optimization algorithm was proposed to train the model efficiently. Experimental results on multiple data sets show that the proposed model has a significant improvement in MAE and RMSE compared with the competing methods. The proposed model provides more accurate and comprehensive analysis results for computer science education.
Keywords: cognitive diagnosis; probability graphic model; educational data mining
Students acquire knowledge in schools from diverse courses, and teachers give students assignments or tests to practice the skills taught in courses. Giving accurate and rapid feedback to students during their daily practice plays an important role in teaching process. It has been proved that rapid feedbacks can improve students’ performance: in a controlled experiment, students’ final grades had been improved when feedback was delivered quickly, but not if delayed by 24 hours[1]. In the traditional teaching process, scores or grades are provided as feedback for students. However, students with the same score may have different cognitive processes. A single score can not distinguish cognitive differences between students. With the rapid development of information technology in education, we hope to analyze students’ various abilities in courses and learning characteristics of students.
Recent decades have witnessed the development of educational data mining (EDM), which refers to the mining of valuable information from the data collected during the education process. Cognitive diagnosis is one of the key applications of EDM. It refers to analyzing students’ answers on a set of questions to infer students’ mastery of knowledge concepts. Nowadays, people are dissatisfied with givingeach student a simple test score or a grade to indicate their ability. They prefer the ways that can provide diagnostic information and report the cognitive structure of students. Cognitive diagnostic models (CDMs) can be used to model students, estimate their abilities and predict their performances on each question. The existence of CDMs allows us to know the cognitive structure of students precisely, as well as provides a basis for teachers’ personalized guidance. At present, cognitive diagnosis has achieved good performance in the evaluation of students in traditional subjects such as mathematics and English[2, 3].
Although cognitive diagnosis has performed well in students’ evaluations of traditional subjects, it still has some shortcomings in the field of the computer science education. The reason is that computer science is different from traditional subjects. In addition to theoretical knowledge concepts, the training of programming is also essential, which means cultivating students’ ability to turn theoretical knowledge concepts into codes. In the field of the computer science education, the ability to write codes is the bridge that applies knowledge concepts to real life and solves practical problems with coding. Therefore, it is indispensable in cognitive diagnosis of computer science education to master the students’ practiceabilities and help students improve their coding skills. However, existing cognitive diagnosis methods only consider the way to model students’ theoretical knowledge concepts with ignoring the ability to use knowledge concepts in practice. In order to solve this problem, we propose a new cognitive diagnostic framework for the computer science education (CDF-CSE), which can model students’ theoretical knowledge concepts and programming ability at the same time. We can evaluate students’ programming ability according to the practices of students by using our framework, thereby assisting students to learn and improve in coding. The proposed method can be applied in the computer science education, in which we can diagnose students comprehensively and explore the potential factors and characteristics of students in various aspects.
To the best of our knowledge, this is the first attempt to combine theoretical learning with their practical abilities. The proposed method models students’ programming abilities to bring cognitive diagnostics to the field of the computer science education. And it models theoretical and practical abilities at the same time to predict students’ performance and analyze students comprehensively. We design an effective algorithm for parameter estimation and conduct extensive experiments on multiple datasets (including two data sets collected from computer science courses in the University of Science and Technology of China) to demonstrate the effectiveness of our framework.
In educational psychology, many cognitive diagnostic models[3]have been developed to mine students’ skill proficiency (mainly related to the mastery of theoretical knowledge concepts). Study of CDMs includes two aspects, discrete and continuous. The fundamental discrete CDMs is deterministic inputs, noisy “and” gate model (DINA)[4-6]. DINA describes a student by a binary vector, each value of which indicates whether the student has mastered a certain skill. In addition, DINA also introduced a skill matrixQto represent skills required for each problem. TheQ-matrix can guarantee the interpretation of diagnosis results. Based on DINA, higher-order DINA (HO-DINA) that contains a higher-order cognitive parameter to represent overall abilities of students was proposed[7]. Besides, in order to meet the needs of processing large-scale data, a generalization of DINA, also called G-DINA, has appeared[6]. Though discrete CDMs are interpretable, their diagnosis results are usually not very accurate.
For continuous CDMs, the basic method is item response theory (IRT)[8], which characterizes a student by a continuous variable that corresponds to the latent trait of student, and use a logistic function to model the probability that a student correctly solves a problem. A single latent trait only shows the general cognitive status of students. Therefore, multidimensional IRT has been proposed to describe students’ skill proficiency comprehensively. Multidimensional IRT are divided into compensatory (MIRT-C) and non-compensatory (MIRTNC)[3]. It is supposed that skills that students do not know can be made up by other related skills in MIRT-C, while opposition in MIRT-NC. Continuous models describe students more accurately than discrete models, but its assumptions may not be suitable for the computer science education. Furthermore, neither model is suitable for those subjective questions.
Based on cognitive diagnosis results, it emerges predictions of the students’ performance on questions that need specific skills[9]. Besides, some researchers analyzed the impact of objective and subjective factors on students’ question answering process[10]. And some efforts were tried to visualize the results of cognitive diagnosis for further analyses in a more convenient way[11].
Some studies attempted to use the matrix factorization (MF) in recommendation systems for cognitive diagnosis. The basic idea is to treat students as users, questions as items, and test scores as user’s scores on items. In this case, we can factorize the score matrix to get the student vector and question vector, and predict the student’s score on new questions. Related work includes using the singular value decom-position (SVD) and other factor models to model students[12]. Some researchers compared MF techniques with regression methods to predict students’ performance[13]. In MOOC, a MF-based approach was proposed to model learning preferences[14]. In addition, there are some work applied non-negative matrix factorization to infer theQmatrix[15, 16]. And some scholars used relational MF to model students in an intelligent tutoring system[17]. Even if there are many attempts on MF, the parameters obtained by MF are unexplained compared with the serious diagnosis. We don’t know what kind of user information the user vector represents, nor do we know what characteristics the problem vector corresponds to. Although the matrix factorization method can achieve good performance in predicting students’ scores, it still can not give us sufficient information.
In this section, we will introduce our cognitive diagnostic framework for computer science education (CDF-CSE). In existing cognitive diagnostic models, students’ proficiency in skills, or knowledge concepts, refers to their ability to use these skills to solve theoretical problems. In this paper, we call it theoretical abilities. In the computer science education, we also need to consider the ability of students to turn knowledge concepts into codes. Based on what was mentioned above, we will model students’ theoretical as well as their experimental abilites(abilities to write codes) in our model. In addition, we have added a parameter that indicates the student’s overall programming abilities rather than a specific knowledge concept. We will introduce our model in the details in the following subsections.
It is necessary to formalize our problem first. We assume that we haveMstudents in a course, then, the teacher teachesKskills and assignsNitheoretical questions,Neexperiments. For homework or exam questions in the course, the score matrixRis a matrix ofMrows andNicolumns.Rjirepresents score of studentjon questioni, wherej=1,2,…,M,i=1,2,…,Ni, andRji∈[0,1]. For the programming experiments in the course, letR′jeindicates score of studentjon experimente, wheree=1,2,…,Ne, andR′je∈[0,1]. The higher values ofRjiandR′je, the better the student’s performance. Let matrixQbe the indicator matrix that indicates knowledge concepts investigated in each theoretical question.QincludesNirows andKcolumns. Andqikindicates whether questioniinvestigate knowledge conceptk, wherek=1,2,…,K. LetQ′ be the matrix that indicates skills investigated in each experiment. There areNerows andkcolumns inQ′,q′ekindicates whether experimenteinvestigate knowledge conceptk. In general,qik=1 when questionirequires knowledge conceptk, andqik=0 is the opposite. Similarly,q′ek=1 means that knowledge conceptkis needed for solving the experimente, andq′ek=0 when it is not needed. Then we normalize the matrixQandQ′ by making
GivenR,R′,Q,Q′ , our goal is to make cognitive diagnosis for students in the computer science education, which is divided into three parts:
(Ⅰ) Diagnose the student’s overall programming abilitycjin the proposed model.
(Ⅱ) Find out the theoretical masteryαjkand experimental masteryβjkof studentjfor a certain skillk.
(Ⅲ) Predict the performance of students for a new theoretical problemior experimental problemethat requires some skills to solve. The predicted performance of studentjon the theoretical problemiis recorded asηji. Similarly, the predicted performance of studentjon the experimenteis recorded asη′je.
The above diagnostic targets are all valued in [0,1], where 1 means that the student has completely mastered the skill or question, 0 means the opposite.
Programming Ability: In order to evaluate students’ coding abilities, we refer to the research results of the educational psychology: each person has a high-order latent trait, which represents the person’s general ability to learn something[7]. We model this high-order latent trait as a parametercjthat can describe the programming ability of studentj. That is,cjdoes not involve any skill, it involves the ability of programing itself. Generally speaking, each studentjhas an independent parametercjto indicate the student’s ability to write programs.
Mastery and application of skillsAccording to the problem definition,αjkis ability of studentjusing skillkto do theoretical problems, andβjkis the ability of studentjusing skillkto do experiments (e.g. writing code). In the proposed model, we assume that there is no direct correlation between different abilities. Therefore, we assume that abilities of a student in different skills are independent of each other, and abilities of different students are also independent of each other. With common sense, two abilitiesαjkandβjkof the same skill should be related. According to the experience gained during the education process, we believe that they can apply skills in experiments after they have mastered them in theory. In summary, we propose an assumption:
Assumption 3.1The programming abilityβjkin skill k of studentjis directly proportional to student’s theoretical knowledge conceptαjkof the skill and basic coding abilitycj.
In other words, we believe that a person’sexperimental ability depends on his/her theoretical mastery on the corresponding knowledge concept, and is limited by his/her basic programming ability. We write this hypothesis as:
βjk=cjαjk
Problem masteryAs mentioned above,ηjiis student’s mastery of a theoretical question andη′jeis student’s mastery of an experiment. The traditional cognitive diagnostic models believe that the questions are independent of each other. Even if the same knowledge concept may be examined between them, we do not think that there is a direct relationship between questions (the relationship of knowledge concepts between questions is provided byQmatrix). Traditional CDMs assumes that a student’s mastery of a problem is related to the knowledge concepts the student has learned and the knowledge concepts required to answer the question[18]. In actual courses, each question needs one or more knowledge concepts. In general, a student solves a question completely or solves a part of it, which indicates that the student has successfully used specific knowledge concepts required by the question. In this case, we think that the student has mastered or partially mastered the corresponding knowledge concepts. Based on the above analysis, we define students’ mastery of problems (and experiments mastery) as the following assumptions:
Assumption 3.2Mastery of studentjof theoretical problemiis related to the mastery of knowledge concepts examined in the problem, and mastery of the experimental problem is calculated from the theoretical ability of knowledge concepts needed for the problem.
That is, we thought that the performance of a student on a problem is directly proportional to the mastery of the corresponding knowledge concepts in the problem. Student can perform well when he/she is proficient in knowledge concepts. Mathematically, student’s mastery of the theoretical problem is:
and the mastery of student on theexperimental problem is written as:
Actual scoreIn actual situations, students may give correct answers without mastering knowledge concepts because they guess out or write the wrong answer due to carelessness and other reasons. This results in the actual score often deviating from real mastery[19]. According to the practice of probability matrix decomposition in the recommendation system[20], we use the Gaussian distribution to simulate the actual situation from students’ mastery of the problem score.
whereσRandσR′are hyper-parameters andIis identity matrix.
In our model, actual score obeys a Gaussian distribution with the mastery as the mean. It is reasonable that the actual score is related to the student’s mastery of the problem with a little bias due to some uncontrollable factors.
Figure 1. The probability graph model of CDF-CSE
Figure 2. The performance of each model in data set “data structure”
whereσcandσαare hyper-parameters.
According to the above probabilitygraph model and assumptions of parameters, given the observable data, the posterior distribution ofcandαcan be written as:
P(c,α|R,R′)∝P(R|α)P(R′|c,α)P(c)P(α).
The probability distribution of these parameters are:
LetF(c,α) be the negative log-posterior distribution ofcandαfor entire data omitting the constants, which is written as:
Our goal is to minimize the objective functionF(c,α).
Noticing the conditional independence relationships among model parameters, we can devise the following alternating optimization algorithm. In this algorithm, we repeat two optimization steps, one with respect tocand the other with respect toαuntil convergence.
Step 1Optimization w.r.tc
Givenαfixed, the parameterscjfor each studentjis independent of each other. Therefore, we could work on the independent optimization problem for a particularj. This implies that a large problem could be decomposed into relatively small problems, which leads to an efficient algorithm. To solve each optimization problem, we can use any numerical optimization method. In our implementation, we employ the gradient descent method.
cnew=cold-r1g(c),
Step 2Optimization w.r.tα
Similarly, givencfixed, parameterαjkfor each studentjand each questionkis independent. Therefore, we can optimize eachαjkin parallel.
We collected data from the computerscience courses of the University of Science and Technology of China to verify our model. We train our model in three kinds of data sets, a real data set from course “data structure”, a data set from course “network security”, and a synthetic data set. All of three data sets contain students’ scoreRandR′ on theoretical questions and experiments,QandQ′ that indicate the knowledge concepts examined by questions. In real data sets, the scores of students and the knowledge concepts required by questions or experiments are given by teachers. A brief summary of these data sets is shown in table 1.
Table 1. Overview of datasets
We will compare three methods to demonstrate the effectiveness and interpretation of our proposed cognitive modelling framework.
(Ⅰ) The item response theory (IRT)[8]is a typical method of cognitive diagnosis, which assumes that students’ abilities are unidimensional and items (the items in IRT are the questions in cognitive diagnostic scenarios) are locally independent;
(Ⅱ) Probabilistic matrix factorization (PMF)[20]is one of the popular methods in matrix factorization. The main idea of this method is to factorize the score matrix into two matrices, one of which represents potential characteristics of users and the other represents potential characteristics of items. Then it uses these two matrices to predict the new scores;
(Ⅲ) Fuzzy CDF[21]introduces the concept of the fuzzy system to CDM so that cognitive diagnosis can be used for objective problems. Therefore, the model can predict students’ scores as continuous values. It combines logistic regression used in IRT and Q matrix used in DINA to perfect itself.
As mentioned above, our data sets include theoretical and experimental questions, but none of the above three competing methods can train such data sets. In order to solve this problem, we train in two situations: ①Treat both kinds of questions as the same kind of question; ②Divide two kinds of problems into two data sets and train them separately.
Parametersαandβare the theoretical and practical abilities of students given by the model. We use these two parameters to predict students’ scores. The reliability of the model is judged by the error of the predicted score. Since the evaluation metrics of the proposed model and other methods is the error between predicted student’s scores and real student’s scores. We use two metrics MAE and RMSE to measure the value of errors. Both our CDF-CSE and other baseline approaches are implemented by using python on a Core i5 3.2 Ghz machine with Windows 7 and 8 GB memory.
To observe how these methods behave at different sparsity levels, we construct different sizes of training sets, with 10% to 80% of score data of each data set, and the rest for testing. Forcomparison, we tuned parameters to record the best performance of each algorithm. In experiments, we consider three implementations of matrix factorization method PMF. That is, PMF-5D, PMF-10D and PMF-KD represent the PMF with 5 ,10 andK(the number of knowledge concepts) latent factors, respectively. Thus, there are totally six results in each split.
Figures 2-4 show the predicting scores performance results of our CDF-CSE and baseline methods on different data sets. From Figures 2-4, we observe that, our CDF-CSE performs the best over all data sets. Specifically, by combining educational hypotheses it beats PMF, by quantitatively analysing examinees from a fuzzy viewpoint, it beats IRT, and by combining the theory and the experiment it beats all other methods. More importantly, with the increasing of sparsity of training data (training data ratio declines from 80% to 20%), the superiority of our CDF-CSE method becomes more and more significant. For instance, when the training data is 20% and under the metric of MAE, the improvement of CDF-CSE compared to the best baseline method can reach 47.8%, 65.8%, and 49.8% on each data set that treat both kinds of questions as the same kind of question.
Figure 3. The performance of each model in data set “network security”
Figure 4. The performance of each model in synthetic data set
Figure 5. The performance of each model in teaching process
It is obvious that the proposed model is more accurate than other methods. The reason is that our CDF-CSE can be trained from both theoretical, and experimental questions on a data set that separates two kinds of questions. That is, compared with other models that only consider one kind of problem, our model can obtain more information in the training process. On data sets that consider two kinds of questions as one, our model will provide different probability hypotheses for two kinds, which is in line with the real experience. Even if in the special situation that students have the same probability distribution of scores on both questions, our model can work well. However, only one probability distribution can be considered in other models, which would produce errors inevitably. In other words, a student’s performance is identical on the theoretical and experimental problems that examine the same knowledge concepts. Our model makes good use of this characteristic. When the model observes that students have a good grasp of theory in a knowledge concept, it would predict that the students have a good grasp of the experiment in this knowledge concept. At the same time, we can also use the experimental performance of students to deduce his/her theoretical ability. This method is in line with the teaching experience. And we also see from the experimental results that the method is feasible. Therefore, we can see from the experimental results that the competitive method performs poorly on a data set sometimes. In summary, CDF-CSE captures the characteristics of students more precisely and it is also more suitable for real-world and synthetic scenarios, where the data is sparse.
Besides, we hope that cognitive diagnostic models will not only have an evaluation of students after course is completed,but also can give students feedbacks during the course. In this way, cognitive diagnostic models can help students find their shortcomings and adjust their learning plans in a timely manner while they are studying. Therefore, we conducted an experiment based on the process of the course, that is, training in the chronological the order of theoretical and experimental arrangement. In this experiment, we fixed the data amount of training set to 80%, and the rest was used as the test set. At the same time, in the chronological order, only a few questions are used for training at the beginning, and then the amount of questions is gradually increased.
Figure 5 shows the results in teaching process of our CDFCSE and baseline methods on data sets. We can see from the pictures that, our CDF-CSE still performs best on all data sets. From the perspective of the following course, our model can perform better at an early stage (when there are fewer knowledge concepts and questions). As the amount of data increases, the advantages of our model become gradually obvious. For example, when the number of questions is small and under the MAE metric, compared with the best competing method, the improvement of our CDF-CSE can reach 37.8%, 42.5% and 27.7% on each data set. And under the same circumstances with more questions, the improvement of our model can reach 32.3%, 36.5% and 45.6% on each data set. This proves that it is feasible to combine both theoretical and experimental performances to analyze students when there are few data. At the early stage of teaching, our model can also analyze the characteristics of students well. With the development of courses, the analysis results will be more and more accurate. In summary, the proposed model can follow up a complete computer education course as well.
It can be seen from experimental results that CDF-CSE outperforms other competing methods in predicting student performance.This is because our model can extract the common characteristics from the two kinds of questions and distinguish their differences at the same time, so as to diagnose students’ theoretical and practical cognition. Compared with other models, our results are more adequate and accurate. The experimental results also confirm that our model can be applied to different situations. Therefore, we can use different data of students to analyze more comprehensive cognitive information. We can make conclusion that our model can solve the problem of inaccurate feedbacks in the traditional teaching. In future applications, CDF-CSE can obtain interpretative cognitive analysis results for students, which can be used for composing a detailed and human readable diagnosis report. At the same time, its prediction of the student performance can help teachers know the teaching situation and conduct their personalized teaching. In courses of the computer education, it can help students improve themselves, as well as assist teachers to adjust their teaching plans for students.
In this paper, we propose a cognitive diagnostic framework (CDF-CSE) for the computer science education, so that we can explore students’theoretical and practical abilities in the computer science education at the same time. Specifically, our model defines students’ programming abilitiy and combine students’ theoretical ability with their experimental ability. We propose an algorithm to optimize the parameters of the model. The experimental results on the data sets of the computer science courses of the University of Science and Technology of China demonstrated that CDF-CSE can diagnose characteristics for each student quantitatively and interpretatively, thus performing better in predicting students’ performance. In particular, experiments on real computer education data sets have proved that our model can be applied in real courses to help students understand their programming level in the future. And our model can get accurate results in the teaching process, which facilitates teachers to know students’ learning status and adjust their teaching plan.
However,there is still some room for improvement. First, CDF-CSE confronts the problem of high computational complexity currently, it is important for us to design an efficient parameter optimization algorithm. Second, the prerequisite relationship of knowledge concepts should be considered for cognitive modelling. Last but not least, there are many code-related features that should be considered in the cognitive diagnosis model for the computer science education. Besides, we plan to apply our improved model in actual courses to prove the practicability of our model, and perfect our model according to the feedback.