Generation of Personalized Knowledge Graphs Based on GCN

Show more

1. Introduction

Socrates, an ancient Greek educator, discovered more than 2000 years ago that the most common mistake in education is indoctrination, which mistakenly regards students as containers. In traditional education, teachers are the center and lack of attention to students’ individual differences and different needs, which cannot stimulate students’ enthusiasm for learning. Education is the kindling of a flame, not the filling of a vessel. Experts and scholars in the field of education have noticed the shortcomings of traditional education and made great efforts to explore in order to put forward better solutions.

In the information age, personalized learning provides a wide and fair way to support educators in their efforts to endow learners with personal power and has become a new learning concept highly respected by the society. Different students have different ability to accept knowledge due to their different living environment, learning style, thinking style, gender differences and other factors. Personalized learning can tailor a set of exclusive learning programs for different students according to each person’s growth, background, interest and experience, so that students can give full play to their subjective initiative and learn efficiently.

The practice of personalized learning cannot be separated from the support of theory and technology. With the rapid development of computer technology in today’s society, personalized learning under the computer background is to combine computer technology with personalized learning, so that learners can quickly and efficiently choose suitable learning resources from numerous learning resources and improve the efficiency of learners. At present stage, the research on personalized learning mainly stays on the theoretical basis of pedagogy, and fails to integrate with computer technology to better promote the development of personalized learning. Meanwhile, in the process of learning, students encounter a great difficulty is that how to structure the scattered knowledge so as to realize the representation of knowledge. Based on this, this paper proposes a method to generate personalized knowledge graph based on Graph Convolution Neural network (GCN) to help students realize personalized learning.

• Using Word2vec model to generate word vectors and NEO4J graphic database to construct the knowledge graph of Junior High School English (JHSE) subject in vertical field.

• The adjacency matrix corresponding to the knowledge graph is constructed, word vectors are generated by word segmentation tools, and rules are formulated to generate feature vectors of exercise data.

• Graph convolution neural network is used to train the original knowledge graph of Junior High School English, and the threshold is adjusted to optimize the training results to generate personalized knowledge graph.

2. Related Work

In recent years, in the field of education at domestic and abroad, the research on personalized learning has produced a lot of meaningful scientific research results. Cao and Zhu [1] tried to build a personalized learning platform based on dynamic collection, accurate analysis and visual feedback of learning analysis. Kong et al. summarized the development of personalized learning and discussed how to use technology to support the development of personalized learning [2]. In the implementation of personalized learning, Song et al. Tried to use mobile devices and online learning platforms to build a personalized learning environment and carry out personalized learning [3]. Zhou built an adaptive learning platform, which can push personalized learning resources for learners [4]. Wu integrated the personalized learning concept with the flip classroom, and proposed the flip classroom teaching mode based on the personalized learning concept [5]. Ma et al. Proposed a personalized adaptive system architecture model based on big data and a learning resource push method based on collaborative filtering [6]. Chen et al. starting from the characteristics and needs of ubiquitous learning, designed and implemented a personalized learning evaluation system based on process information suitable for ubiquitous learning [7].

Knowledge graph is the basic core technology of artificial intelligence research and intelligent information service, which can endow agents with the capabilities of accurate query, in-depth understanding and logical reasoning. Knowledge graph is widely used in knowledge-driven tasks such as search engines, question and answer systems, intelligent dialogue systems and personalized recommendations [8]. By mining entity and relation information in real texts, we can organize world knowledge into structured knowledge networks. Large-scale world knowledge graph such as Freebase [9] and DBpedia [10] contain a large amount of structured world knowledge. At present, there have been a lot of researches on the learning of word representation. Mikolov proposed a new word embedding method Word2vec. Wang et al. [11] elaborated the construction method and process of enterprise-level knowledge graph platform [12]. Hu proposed a knowledge graph construction method based on multiple data sources, and constructed a Chinese general knowledge graph [13]. Zhang et al. proposed a knowledge reasoning method based on the knowledge graph of mathematics curriculum, thus improving the effect of knowledge service [14]. From the perspective of man-machine cooperation, Li et al. Proposed the construction method of knowledge graph in adaptive learning system [15]. Liu analyzed the characteristics of robot education in primary and secondary schools, and proposed a teaching mode of robot education in junior high schools based on knowledge graph [16].

Graph Convolutional Network was first proposed by Kipf et al. in 2017 to process graph data [17]. Graph neural network is widely used in many fields. Liu et al. Used convolution neural network to analyze the emotional tendency of microblog [18]. Chang et al. studied the application of convolution neural network in graph object detection and face recognition [19]. Yang et al. improved the original Google Net convolution neural network and applied it to clothing matching recommendation with good results [20].

3. Model Architecture

When generating a personalized knowledge graph, Firstly, some nodes in the knowledge graph are marked, the content of the mark is divided into two categories: students have mastered it and students have not mastered it. After that, the adjacency matrix corresponding to knowledge graph and the feature vector corresponding to the nodes are taken as the input of the graph convolution neural network. Through continuous iterative training of the graph convolutional network model, all nodes contained in the knowledge graph can be classified. Finally, generating personalized knowledge graph for different students. The process is shown in Figure 1.

3.1. Data Preparation

3.1.1. Generation of Adjacent Matrix of Knowledge Graph

Computer can’t process and store graph data directly, so it needs to store graph data in the form of adjacency matrix or adjacency table. When the adjacency matrix is used to store graph data, each node in the graph should also be numbered, and then the subscripts of the matrix are used to represent the nodes in the graph, and the values in the matrix are used to represent the relationship between the nodes. The knowledge graph generated in this paper is an undirected graph composed of exercise nodes, and the adjacency matrix generated by the knowledge graph is time symmetric matrix. When using NumPy to generate adjacency matrix, a two-dimensional matrix with 9449 * 9449 size and values of 0 is first defined, where 9449 is the number of nodes in the knowledge graph. When the similarity between the two exercise nodes is higher than the threshold defined in this paper, the edges of the two nodes in the knowledge graph are connected, that is, the corresponding position in the adjacency matrix should be 1. While generating edges for two nodes in the knowledge graph, the corresponding positions in the adjacency matrix change accordingly, so the adjacency matrix will also be generated during the generation of the knowledge graph. Part of the generated adjacency matrix information is shown in Figure 2.

3.1.2. Generation of Feature Vectors

In this paper, Word2VEC is used to generate feature vectors for each exercise,

Figure 1. Construction process of personalized knowledge graph.

Figure 2. Part of the generated adjacency matrix information.

and the dimension of the feature vectors is set to 512. First of all, we choose word segmentation tools Word2vec and Jieba respectively for the topic stem and analysis in the exercise data. After obtaining all the words involved in the exercise data, the exercise stem and the English words involved in the exercise stem are transmitted to the Word2vec model for training, and word vectors can be generated for each English word. After obtaining the word vector corresponding to each word, the sentence vector SQ corresponding to the exercise stem can be obtained by adding the word vectors of all words contained in the exercise stem and dividing by the number n of words contained in the exercise stem. As shown in Equation (1). SQ_{k} represents the value of the k-th dimension of the sentence vector SQ, and W_{ik} represents the value of the k-th dimension of the i-th word vector in the topic stem.

$SQ=\left(S{Q}_{1},S{Q}_{k},\dots ,S{Q}_{512}\right)=\left(\frac{{\displaystyle {\sum}_{i=1}^{n}{W}_{i1}}}{n},\frac{{\displaystyle {\sum}_{i=1}^{n}{W}_{ik}}}{n},\dots ,\frac{{\displaystyle {\sum}_{i=1}^{n}{W}_{i512}}}{n}\right)$ (1)

After that, the sentence vector corresponding to the topic stem and the sentence vector corresponding to the analysis are added and divided by 2 to obtain the feature vector ST of the exercise, as shown in Equation (2). ST_{k} represents the value of the k-th dimension of the feature vector, SQ_{k} represents the value of the k-th dimension of the sentence vector, and SA_{k} represents the value of the k-th dimension of the vector corresponding to the exercise analysis.

$T=\left(S{T}_{1},S{T}_{k},\dots ,S{T}_{512}\right)=\left(\frac{S{Q}_{1}+S{A}_{1}}{2},\frac{S{Q}_{k}+S{A}_{k}}{2},\dots ,\frac{S{Q}_{512}+S{A}_{512}}{2}\right)$ (2)

3.2. Classification Using Graph Convolutional Network

Graph neural network can obtain the relationship between nodes in the graph through information transmission between nodes in the graph, and has good performance in knowledge representation, node classification, link prediction, etc. As a kind of graph neural network, GCN has a good performance in node classification. This paper uses the application of graph convolutional neural network in node classification to generate personalized knowledge graph, which is a semi-supervised learning model. Graph convolutional neural network refers to a graph neural network model that performs convolution operations on a graph. This convolution operation has many similarities with convolution operations in convolutional neural network, such as weight sharing, local connection, multi-layer network, etc. There are many edges and interconnected nodes on the knowledge graph. When representing one of the nodes, the model will use its surrounding nodes to represent the node, which is graph convolution process. In the convolution process, the features of the nodes around the node will affect the representation of the current node.

The essence of this process is feature extraction. With the continuous increase of model training times, deeper features are extracted. Graph convolutional neural network has multiple hidden layers. In multilayer graph convolutional neural network, the propagation rules between layers are defined by Equation (3).

${H}^{(l+1)}=\sigma \left({v}^{-\frac{1}{2}}(A+{I}_{N}){v}^{-\frac{1}{2}}{H}^{l}{W}^{l}\right)$ (3)

In the above Equation, A represents the adjacency matrix of the knowledge graph, ${I}_{N}$ represents the diagonal matrix corresponding to the knowledge graph, $A+{I}_{N}$ means that self-connection is added to the adjacency matrix of the knowledge graph, that is to say, assuming that there are edges in the knowledge graph that are connected with the exercise nodes themselves, the self-connection knowledge graph can avoid losing features when training the graph convolutional neural network, so that the learning effect of the model is

better. ${v}^{-\frac{1}{2}}$ and ${W}^{l}$ represent the weight matrix to be learned by the l-layer,

${H}^{l}$ represents the state matrix of the l-layer hidden l-layer, and $\sigma $ represents the activation function of the current hidden layer. This activation function can be any activation function, and the ReLU function is selected here.

The convolution method of graph convolutional neural network can be roughly divided into two types: spectral convolution and spatial domain convolution. When the amount of data is small, spatial domain convolution is selected. When the amount of data is large, spectral convolution is used because the calculation of spatial domain convolution is too complicated. The convolution method used in this paper is spectral convolution. Spectral convolution refers to the processing of filters and graph signals in graph convolution network in Fourier domain. Equation (3) is the most basic convolution propagation method in graph convolution neural network. This propagation method can be replaced by the first-order approximation of the local spectral filter on the knowledge graph, thus simplifying the calculation, reducing the calculation amount and accelerating the calculation speed of the model.

In order to make the model have a better effect, the characteristic matrix is converted into Laplace matrix L in the process of spectral convolution. The calculation method of Laplace matrix L is: $L=D-A$, where D is a diagonal matrix, each value on the diagonal line is the degree of a vertex on the graph, and A refers to the adjacency matrix of the graph, which is the most basic definition of Laplace matrix. The Laplace transformation method in the graph convolutional

neural network is: $L={D}^{-\frac{1}{2}}(D-A){D}^{-\frac{1}{2}}$. Laplace matrix L must be a positive

semidefinite symmetric matrix with n linearly independent feature vector. Such a matrix can completely carry out feature decomposition, thus making the convolution operation of the atlas proceed smoothly.

The spectral convolution mode of a graph can be defined as the product of the graph signal $x\in {R}^{N}$ and the filter ${g}_{\theta}=diag(\theta )$. In order to facilitate calculation in the Fourier domain, the filter is parameterized in the Fourier domain $\theta \in {R}^{N}$ as shown in Equation (4).

${g}_{\theta}*x={U}_{g}(\Lambda ){U}^{T}x$ (4)

${U}^{T}$ is the graph Fourier transform of x, ${U}_{g}(\Lambda )$ is a function of the feature values of the graph Laplace matrix, U is a matrix composed of the feature vector of the normalized graph Laplace matrix, and is a diagonal matrix composed of the eigenvalues of L. The calculation method of the graph Laplace matrix is shown in Equation (5).

$L={I}^{N}-{D}^{-\frac{1}{2}}A{D}^{-\frac{1}{2}}=U\Lambda {U}^{T}$ (5)

The above Equation can be calculated on a small-scale graph, The JHSE knowledge graph in this paper contains 9449 exercise nodes. The scale of the graph is relatively large, the complexity of calculating the feature value decomposition of the Laplace matrix of the graph will be very high and the calculation time of the model will become longer. In order to solve this problem, this paper uses the k-order expansion of Chebyshev polynomial ${T}_{k}(x)$ to replace $g(\Lambda )$ approximately, such as Equation (6).

$\stackrel{\u02dc}{g}(\Lambda )\approx {\displaystyle \underset{0}{\overset{k}{\sum}}\stackrel{\u02dc}{{\theta}_{k}}}{T}_{k}\stackrel{\u02dc}{\Lambda}$ (6)

where $\stackrel{\u02dc}{\Lambda}=2\Lambda /{\lambda}_{max}-{I}_{N}$, ${\lambda}_{max}$ denotes the largest feature value of L, and $\stackrel{\u02dc}{\theta}\in {R}^{k}$ is the Chebyshev coefficient vector. According to the definition of Chebyshev polynomial ${T}_{k}(x)=2x{T}_{k-1}(x)-{T}_{k-2}(x)$, where ${T}_{0}(x)=1,{T}_{1}(x)=x$, so Equation (4) can be written as Equation (7).

${g}_{\theta}*x\approx {\displaystyle \underset{0}{\overset{k}{\sum}}\stackrel{\u02dc}{{\theta}_{k}}}{T}_{k}(\frac{2}{{\lambda}_{max}}L-{I}_{N})x$ (7)

The K-order Chebyshev polynomial used in the Equation (7) and the Laplace polynomial in the Equation (4) both are also K-order, that is to say, the representation of a node in the JHSE knowledge graph of this paper only depends on the exercise nodes with path length within K. Graph convolution neural networks can be stacked with multiple convolution layers, each convolution layer is composed of the above Equation (7). As the number of convolution layers increases, the parameters in the model also increase, which leads to the problems of slow calculation speed and over-fitting of the model. In order to prevent the above problems, the number of convolution layers is set at two in this paper, and its forward propagation model can be simply written as Equation (8).

$Z=f(X,A)=softMax\left(\stackrel{\u02dc}{A}RELU(A\stackrel{\u02dc}{X}{W}^{0}){W}^{1}\right)$ (8)

${W}^{0}$ is the weight matrix of the convolution layer of the first layer, ${W}^{1}$ is the weight matrix of the convolution layer of the second layer, $\stackrel{\u02dc}{A}={D}^{-\frac{1}{2}}(A+{I}_{N}){D}^{-\frac{1}{2}}$. The model will output the category probability of each node, thus generating a personalized knowledge graph.

3.3. Generation of Personalized Knowledge Graph

After passing through the forward propagation model, the knowledge graph of Junior High School English exercises will get the category probability of each node. If the first probability value in the exercise node is large, its label is mastered, otherwise it is not mastered, and the probability value output by the model will be converted into the label corresponding to the exercise. After that, by establishing the mapping file of exercise number and exercise label, the exercise label and exercise number are linked and expressed in the Junior High School English knowledge graph.

4. Experimental Results

4.1. Dataset

This paper finally got 24,092 exercises, which became 9449 after data cleaning. Among them, there are 4892 lexical exercises, 4325 grammatical exercises, and 232 comprehensive questions. Examples of exercises are shown in Table 1.

In the experiment of personalized knowledge graph based on GCN, we divide the dataset to the train dataset and the test dataset, which are 70% and 30%, 6614 and 2835 respectively.

4.2. Experiment on Personalized Knowledge Graph

Knowledge graph is a kind of knowledge representation that can be recognized and recognized and is machine-friendly. The construction of knowledge graph in the field of education can not only integrate complicated knowledge with each other, but also clarify the complicated connection between knowledge points. In this paper, we crawl the English exercises of Junior High School from the Internet

Table 1. Dataset example.

through the web crawler, and clean the data, including processing the missing values of the data, removing or completing the missing data. Then modify and remove the data of logic errors and duplicate values. The processed structured data is used to construct the original knowledge graph by NEO4J graphic database. The knowledge graph of Junior High School English exercises is shown in Figure 3.

When generating a JHSE knowledge graph, the relationship between nodes is determined by comparing the cosine similarity of two sentence vectors with a threshold. When the threshold is different, the generated knowledge graph is different. In this paper, the five different thresholds of 0.5, 0.6, 0.7, 0.8, and 0.9 are selected to observe the experimental results. The experimental results corresponding to different thresholds are shown in Table 2.

From the above experimental results, we can find that with the continuous increase of the threshold, the model’s results are getting better and better. When the threshold increases to 0.8, the model’s effect is the best. When the threshold continues to increase to 0.9, the model’s effect is beginning to decline. This result is not difficult to understand. When the threshold is low, there are many relationship numbers in the knowledge graph, and there are many noisy data in the

Table 2. Experimental results of different thresholds.

Figure 3. Knowledge graph of JHSE.

knowledge graph. You may learn many unnecessary features when performing the convolution operation. When the threshold is high, the number of relations in the knowledge graph is relatively small, and the features learned during the convolution operation are not comprehensive, and the model effect also decreases.

From the experimental results, we can also find that when the number of model iterations reaches to 295, the accuracy of the model on the training set reaches the highest, in other words, the model has been fully learned, and continuing to increase the number of iterations will cause over-fitting problems. When training the model, besides the number of iterations, there are many parameters that need to be set, such as the learning rate, the number of convolutional layers, the number of convolutional layer neurons, etc. The setting of these parameters is shown in Table 3.

The Personalized Junior High School English knowledge graph can be generated, as shown in Figure 4.

It is worth noting that the number of convolutional layers of the neural network cannot be too much, because the convolution process of the GCN uses the characteristics of the surrounding nodes to represent the current node. After a lot of convolution operations, the representation of each node may become very similar, and the difference between the nodes is not big, but it will reduce the model effect.

Table 3. Parameter settings.

Figure 4. Personalized knowledge graph.

5. Conclusion and Future Work

In this paper, how to use graph convolution neural network to generate personalized knowledge graph based on Junior High School English knowledge graph, and introduces the process of graph convolution in detail, and analyzes the experimental results, and finds out the model parameters with better effect. This has important reference value for the research on the construction of subject personalized knowledge graph in the field of education. At the same time, in the future we will pay attention to whether there is a better model that can improve the experimental effect on the basis of GCN model.

Acknowledgements

This work is supported by National Nature Science Foundation (No. 61972436).

References

[1] Cao, X.M. and Zhu, Y. (2014) Research on Personalized Learning Platform from the Perspective of Learning Analysis. Research on Open Education, 20, 67-74.

[2] Kong, J., Guo, Y.C. and Guo, G.W. (2016) Personalized Learning Supported by Technology: A New Trend to Promote Students’ Development. China Educational Technology, 4, 88-94.

[3] Song, Y.J, Wong, L.H. and Looi, C.K. (2012) Fostering Personalized Learning in Science Inquiry Supported by Mobile Technologies. Educational Technology Research & Development, 60, 679-701. https://doi.org/10.1007/s11423-012-9245-6

[4] Zhou, H.B. (2018) Research on Promoting Students’ Personalized Learning Based on Adaptive Learning Platform. E-Educational Research, 39, 122-128.

[5] W, H.Y. (2015) The Integration of Personalized Learning Concept and Flip Classroom Teaching Mode. Modern Educational Technology, 25, 46-52.

[6] Ma, X.C., Zhong, S.C. and Xu, D. (2017) Research on Support Model and Implementation Mechanism of Personalized Adaptive Learning System from the Perspective of Big Data. China Educational Technology, 4, 97-102.

[7] Chen, M. and Yang, X.M. (2016) Design and Implementation of Personalized Learning Evaluation System Based on Process Information in Ubiquitous Learning Environment. China Educational Technology, 6, 21-26.

[8] Liu, Z.Y., Han, X. and Sun, M.S. (2020) Representation Learning of World Knowledge In: Knowledge Graph and Deep Learning, Tsinghua University Press, Beijing, 19-25.

[9] Bollacker, K., Evans, C. and Paritosh, P. (2008) Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. Proceedings of KDD, Association for Computing Machinery Press, New York, 1247-1250.
https://doi.org/10.1145/1376616.1376746

[10] Auer, S., Bizer, C. and Kobilarov, G. (2007) DBpedia: A Nucleus for a Web of Open Data. Proceedings of ISWC, Springer, Berlin, Heidelberg, 722-735.
https://doi.org/10.1007/978-3-540-76298-0_52

[11] Mikolov, T., Chen, K. and Corrado, G. (2013) Efficient Estimation of Word Representations in Vector Space. Proceedings of ICLR, USA. arXiv:1301. 3781

[12] Wang, H.F., Ding, J. and Hu, F.H. (2020) Summary of Large-Scale Enterprise Knowledge Graph Practice. Computer Engineering, 46, 1-13.

[13] Hu, F.H. (2015) Research on Construction Method of Chinese Knowledge Map Based on Multiple Data Sources. Ph.D. Thesis, East China University of Technology, Shanghai.

[14] Zhang, C.H., Peng, C. and Luo, M.Q. (2020) Construction and Reasoning of Knowledge Graph in Mathematics Curriculum. Computer Science, 47, 573-578.

[15] Li, Z., Dong, X.X. and Zhou, D.D. (2019) Research on Human-Computer Cooperation Construction Method and Application of Knowledge Graph in Adaptive Learning System. Modern Educational Technology, 29, 80-86.

[16] Liu, S.Z. (2020) Research on the Construction of Robot Education Knowledge Map in Primary and Secondary Schools. Ph.D. Thesis, Minzu University of China, Beijing.

[17] Kipf, T.N. and Welling, M. (2016) Semi-Supervised Classification with Graph Convolutional Networks. arXiv preprint arXiv:1609.02907

[18] Liu, L.F., Yang, L. and Zhang, S.W. (2015) Analysis of Microblog Emotional Tendency Based on Convolution Neural Network. Journal of Chinese Information Processing, 29, 159-165.

[19] Chang, L., Deng, X.M. and Zhou, M.Q. (2016) Convolution Neural Network in Image Understanding. Acta Automatic Sinical, 42, 1300-1312.

[20] Yang, T.Q. and Huang, S.X. (2018) Application of Improved Convolution Neural Network in Classification and Recommendation. Application Research of Computers, 35, 974-977.