TY - THES T1 - Scalable graph kernels A1 - Shervashidze,Nino Y1 - 2012/10/09 N2 - Graph-structured data are becoming more and more abundant in many fields of science and engineering, such as social network analysis, molecular biology, chemistry, computer vision, or program verification. To exploit these data, one needs data analysis and machine learning methods that are able to efficiently handle large-scale graph data sets. Successfully applying machine learning and data analysis methods to graphs requires the ability to efficiently compare and represent graphs. Standard solutions to these problems are either NP-hard, not expressive enough, or difficult to adapt to a problem at hand. Graph kernels have attracted considerable interest in the machine learning community in the last decade as a promising solution to the above-mentioned issues. Despite significant progress in the design and improvement of graph kernels in the past few years, existing graph kernels do not measure up to the current needs of machine learning on large, labeled graphs: Even the most efficient existing kernels need O(n^3) runtime to compare a pair of graphs with n nodes, or cannot take into account node and edge labels. Our primary goal in this thesis is the design of efficient and expressive kernels for machine learning on graphs. We first focus on the design of generic graph kernels that can be applied to graphs with or without labels. Our main contributions to this end are the following: First, we speed up the exact computation of graphlet kernels from O(n^k) to O(nd^{k-1}) for a pair of graphs, where n is the size of the graphs, k is the size of considered graphlets, and d is the maximum degree in the given graphs. Second, we define a new kernel on graphs, the Weisfeiler-Lehman subtree kernel, which is the first graph kernel scaling linearly in the number of edges in the given graph set. In our experiments on benchmark graph data sets from chemoinformatics and bioinformatics, the Weisfeiler-Lehman subtree kernel gracefully scales up to large graphs, outperforms all existing graph kernels in speed, and yields highly competitive performance in graph classification. Third, we generalize the Weisfeiler-Lehman subtree kernel to a family of kernels that includes many known graph kernels as special cases. This generalization enables existing graph kernels to take into account more information about the graph topology, and thereby become more expressive. In the last part of this thesis, we present two examples of applications: Based on our previous contributions, we propose specialized node kernels for pixel classification in remote sensing images, and graph kernels for chemical shift prediction in structural bioinformatics. Our kernels make it possible for the first time to take advantage of the rich graph structure in these applications. The Weisfeiler-Lehman kernels we propose here now allow graph kernels to scale to large, labeled graphs. They open the door to manifold applications of graph kernels in numerous domains which deal with graphs whose size and attributes could not be handled by graph kernels before. KW - Graph KW - Maschinenlernen auf Graphen KW - Graphkerne KW - Merkmalrepräsantation von Graphen KW - Graphenvergleich CY - Tübingen PB - Universitätsbibliothek Tübingen AD - Wilhelmstr. 32, 72074 Tübingen UR - http://tobias-lib.uni-tuebingen.de/volltexte/2012/6461 ER -