The Max Planck Institute for Informatics is a research institute in computer science with a focus on algorithms and their applications in a broad sense. It hosts fundamental research as well a research for various application domains . It is part of the Max-Planck-Gesellschaft, Germany's largest society for fundamental research.The research institutes of the Max Planck Society have a national and international reputation as “Centres of Excellence” for pure research. The institute consists of five departments and two research groups: The Algorithms and Complexity Department is headed by Prof. Dr. Kurt Mehlhorn, The Computer Vision and Multimodal Computing Department is headed by Prof. Dr. Bernt Schiele, The Department Computational Biology and Applied Algorithmics is headed by Prof. Dr. Thomas Lengauer, Ph.D. The Computer Graphics Department is headed by Prof. Dr. Hans-Peter Seidel The Databases and Information Systems Department is headed by Prof. Dr. Gerhard Weikum Research Group Automation of Logic is headed by Prof. Dr. Christoph Weidenbach The Independent Research Group Computational Genomics and Epidemiology is headed by Dr. Alice McHardy.Previously, it included the following departments: The Programming Logics Department was headed by Prof. Dr. Harald Ganzinger Members of the institute have received various awards. Professor Kurt Mehlhorn and Professor Hans-Peter Seidel received the Gottfried Wilhelm Leibniz Prize, Professor Kurt Mehlhorn and Professor Thomas Lengauer received the Konrad-Zuse-Medal, and in 2004 Professor Harald Ganzinger received the Herbrand Award.The institute, along with the Max Planck Institute for Software Systems , the German Research Centre for Artificial Intelligence and the entire Computer Science department of Saarland University, is involved in the Internationales Begegnungs- und Forschungszentrum für Informatik.The International Max Planck Research School for Computer Science is the graduate school of the MPII and the MPI-SWS. It was founded in 2000 and offers a fully funded PhD-Program in cooperation with Saarland University. Dean is Prof. Dr. Gerhard Weikum. Wikipedia.
Dollar P.,California Institute of Technology |
Wojek C.,Max Planck Institute for Informatics |
Schiele B.,Max Planck Institute for Informatics |
Perona P.,California Institute of Technology
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2012
Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple data sets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: 1) We put together a large, well-annotated, and realistic monocular pedestrian detection data set and study the statistics of the size, position, and occlusion patterns of pedestrians in urban scenes, 2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and 3) we evaluate the performance of sixteen pretrained state-of-the-art detectors across six data sets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians. © 2012 IEEE.
Kratsch S.,University Utrecht |
Wahlstrom M.,Max Planck Institute for Informatics
Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS | Year: 2012
The existence of a polynomial kernel for Odd Cycle Transversal was a notorious open problem in parameterized complexity. Recently, this was settled by the present authors (Kratsch and Wahlströom, SODA 2012), with a randomized polynomial kernel for the problem, using matroid theory to encode flow questions over a set of terminals in size polynomial in the number of terminals (rather than the total graph size, which may be superpolynomially larger). In the current work we further establish the usefulness of matroid theory to kernelization by showing applications of a result on representative sets due to Lovász (Combinatorial Surveys 1977) and Marx (TCS 2009). We show how representative sets can be used to give a polynomial kernel for the elusive Almost 2-sat problem (where the task is to remove at most k clauses to make a 2-CNF formula satisfiable), solving a major open problem in kernelization. We further apply the representative sets tool to the problem of finding irrelevant vertices in graph cut problems, that is, vertices which can be made undeletable without affecting the status of the problem. This gives the first significant progress towards a polynomial kernel for the Multiway Cut problem, in particular, we get a polynomial kernel for Multiway Cut instances with a bounded number of terminals. Both these kernelization results have significant spin-off effects, producing the first polynomial kernels for a range of related problems. More generally, the irrelevant vertex results have implications for covering min-cuts in graphs. In particular, given a directed graph and a set of terminals, we can find a set of size polynomial in the number of terminals (a cut-covering set) which contains a minimum vertex cut for every choice of sources and sinks from the terminal set. Similarly, given an undirected graph and a set of terminals, we can find a set of vertices, of size polynomial in the number of terminals, which contains a minimum multiway cut for every partition of the terminals into a bounded number of sets. Both results are polynomial time. We expect this to have further applications, in particular, we get direct, reduction rule-based kernelizations for all problems above, in contrast to the indirect compression-based kernel previously given for Odd Cycle Transversal. All our results are randomized, with failure probabilities which can be made exponentially small in the size of the input, due to needing a representation of a matroid to apply the representative sets tool. © 2012 IEEE.
Kratsch S.,University Utrecht |
Wahlstrom M.,Max Planck Institute for Informatics
Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms | Year: 2012
The ODD CYCLE TRANSVERSAL problem (OCT) asks whether a given graph can be made bipartite by deleting at most k of its vertices. In a breakthrough result Reed, Smith, and Vetta (Operations Research Letters, 2004) gave a O(4 kkmn) time algorithm for it, the first algorithm with polynomial runtime of uniform degree for every fixed k. It is known that this implies a polynomial-time compression algorithm that turns OCT instances into equivalent instances of size at most O(4 k), a so-called kernelization. Since then the existence of a polynomial kernel for OCT, i.e., a kernelization with size bounded polynomially in k, has turned into one of the main open questions in the study of kernelization. Despite the impressive progress in the area, including the recent development of lower bound techniques (Bodlaender et al., ICALP 2008; Fortnow and Santhanam, STOC 2008) and meta-results on kernelizations for graph problems on planar and other sparse graph classes (Bodlaender et al., FOCS 2009; Fomin et al., SODA 2010), the existence of a polynomial kernel for OCT has remained open, even when the input is restricted to be planar. This work provides the first (randomized) polynomial kernelization for OCT. We introduce a novel kernelization approach based on matroid theory, where we encode all relevant information about a problem instance into a matroid with a representation of size polynomial in k. For OCT, the matroid is built to allow us to simulate the computation of the iterative compression step of the algorithm of Reed, Smith, and Vetta, applied (for only one round) to an approximate odd cycle transversal which it is aiming to shrink to size k. The process is randomized with one-sided error exponentially small in k, where the result can contain false positives but no false negatives, and the size guarantee is cubic in the size of the approximate solution. Combined with an O(√log n)-approximation (Agarwal et al., STOC 2005), we get a reduction of the instance to size O(k 4.5), implying a randomized polynomial kernelization. Interestingly, the known lower bound techniques can be seen to exclude randomized kernels that produce no false negatives, as in fact they exclude even co-nondeterministic kernels (Dell and van Melkebeek, STOC 2010). Therefore, our result also implies that deterministic kernels for OCT cannot be excluded by the known machinery. Copyright © SIAM.
Bock C.,Austrian Academy of Sciences |
Bock C.,Medical University of Vienna |
Lengauer T.,Max Planck Institute for Informatics
Nature Reviews Cancer | Year: 2012
Drug resistance is a common cause of treatment failure for HIV infection and cancer. The high mutation rate of HIV leads to genetic heterogeneity among viral populations and provides the seed from which drug-resistant clones emerge in response to therapy. Similarly, most cancers are characterized by extensive genetic, epigenetic, transcriptional and cellular diversity, and drug-resistant cancer cells outgrow their non-resistant peers in a process of somatic evolution. Patient-specific combination of antiviral drugs has emerged as a powerful approach for treating drug-resistant HIV infection, using genotype-based predictions to identify the best matched combination therapy among several hundred possible combinations of HIV drugs. In this Opinion article, we argue that HIV therapy provides a 'blueprint' for designing and validating patient-specific combination therapies in cancer. © 2012 Macmillan Publishers Limited. All rights reserved.
Adamaszek A.,Max Planck Institute for Informatics |
Wiese A.,Max Planck Institute for Informatics
Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS | Year: 2013
In the Maximum Weight Independent Set of Rectangles (MWISR) problem we are given a set of n axis-parallel rectangles in the 2D-plane, and the goal is to select a maximum weight subset of pairwise non-overlapping rectangles. Due to many applications, e.g. in data mining, map labeling and admission control, the problem has received a lot of attention by various research communities. We present the first (1+ε)- Approximation algorithm for the MWISR problem with quasipolynomial running time 2poly(log n/ε). In contrast, the best known polynomial time approximation algorithms for the problem achieve superconstant approximation ratios of O(log log n) (unweighted case) and O(log n/ log log n) (weighted case). Key to our results is a new geometric dynamic program which recursively subdivides the plane into polygons of bounded complexity. We provide the technical tools that are needed to analyze its performance. In particular, we present a method of partitioning the plane into small and simple areas such that the rectangles of an optimal solution are intersected in a very controlled manner. Together with a novel application of the weighted planar graph separator theorem due to Arora et al.  this allows us to upper bound our approximation ratio by 1 + ε. Our dynamic program is very general and we believe that it will be useful for other settings. In particular, we show that, when parametrized properly, it provides a polynomial time (1 + ε)-approximation for the special case of the MWISR problem when each rectangle is relatively large in at least one dimension. Key to this analysis is a method to tile the plane in order to approximately describe the topology of these rectangles in an optimal solution. This technique might be a useful insight to design better polynomial time approximation algorithms or even a PTAS for the MWISR problem. In particular, note that our results imply that the MWISR problem is not APX-hard, unless NP ⊆ DTIME(2polylog (n)). Copyright © 2013 by The Institute of Electrical and Electronics Engineers, Inc.
Neumann T.,Max Planck Institute for Informatics |
Weikum G.,Max Planck Institute for Informatics
VLDB Journal | Year: 2010
RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2. 0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing. The physical design is identical for all RDF-3X databases regardless of their workloads, and completely eliminates the need for index tuning by exhaustive indexes for all permutations of subject-property-object triples and their binary and unary projections. These indexes are highly compressed, and the query processor can aggressively leverage fast merge joins with excellent performance of processor caches. The query optimizer is able to choose optimal join orders even for complex queries, with a cost model that includes statistical synopses for entire join paths. Although RDF-3X is optimized for queries, it also provides good support for efficient online updates by means of a staging architecture: direct updates to the main database indexes are deferred, and instead applied to compact differential indexes which are later merged into the main indexes in a batched manner. Experimental studies with several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching, manyway star-joins, and long path-joins demonstrate that RDF-3X can outperform the previously best alternatives by one or two orders of magnitude. © 2009 Springer-Verlag.
Suchanek F.,Max Planck Institute for Informatics |
Weikum G.,Max Planck Institute for Informatics
Proceedings of the ACM SIGMOD International Conference on Management of Data | Year: 2013
The proliferation of knowledge-sharing Communities such as Wiki-pedia and the progress in scalable information extraction from Web and text sources have enabled the automatic construction of very large knowledge bases. Endeavors of this kind include projects such as DBpedia, Freebase, KnowItAll, ReadTheWeb, and YAGO. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. They contain millions of entities and hundreds of millions of facts about them. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disam-biguating natural-language text, semantic search for entities and relations in Web and enterprise data, and entity-oriented analytics over unstructured contents. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-the-art methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications. Particular emphasis will be on the twofold role of knowledge bases for big-data analytics: using scalable distributed algorithms for harvesting knowledge from Web and text sources, and leveraging entity-centric knowledge for deeper interpretation of and better intelligence with Big Data. Copyright © 2013 ACM.
Bringmann K.,Max Planck Institute for Informatics
Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS | Year: 2014
The Fréchet distance is a well-studied and very popular measure of similarity of two curves. Many variants and extensions have been studied since Alt and Godau introduced this measure to computational geometry in 1991. Their original algorithm to compute the Fréchet distance of two polygonal curves with n vertices has a runtime of O(n2 log n). More than 20 years later, the state of the art algorithms for most variants still take time more than O(n2/log n), but no matching lower bounds are known, not even under reasonable complexity theoretic assumptions. To obtain a conditional lower bound, in this paper we assume the Strong Exponential Time Hypothesis or, more precisely, that there is no O∗((2-δ)N) algorithm for CNF-SAT for any delta > 0. Under this assumption we show that the Fréchet distance cannot be computed in strongly subquadratic time, i.e., in time O(n2-δ) for any delta > 0. This means that finding faster algorithms for the Fréchet distance is as hard as finding faster CNF-SAT algorithms, and the existence of a strongly subquadratic algorithm can be considered unlikely. Our result holds for both the continuous and the discrete Fréchet distance. We extend the main result in various directions. Based on the same assumption we (1) show non-existence of a strongly subquadratic 1.001-approximation, (2) present tight lower bounds in case the numbers of vertices of the two curves are imbalanced, and (3) examine realistic input assumptions (c-packed curves). © 2014 IEEE.
Bringmann K.,Max Planck Institute for Informatics
Computational Geometry: Theory and Applications | Year: 2012
The measure problem of Klee asks for the volume of the union of n axis-parallel boxes in a fixed dimension d. We give an O(n( d+2)/3) time algorithm for the special case of all boxes being cubes or, more generally, fat boxes. Previously, the fastest run-time was nd/ 22 O(log*n), achieved by the general case algorithm of Chan [SoCG 2008]. For the general problem our run-time would imply a breakthrough for the k-clique problem. © 2011 Elsevier B.V. All rights reserved.
Lawyer G.,Max Planck Institute for Informatics
Scientific Reports | Year: 2015
Centrality measures such as the degree, k-shell, or eigenvalue centrality can identify a network's most influential nodes, but are rarely usefully accurate in quantifying the spreading power of the vast majority of nodes which are not highly influential. The spreading power of all network nodes is better explained by considering, from a continuous-time epidemiological perspective, the distribution of the force of infection each node generates. The resulting metric, the expected force, accurately quantifies node spreading power under all primary epidemiological models across a wide range of archetypical human contact networks. When node power is low, influence is a function of neighbor degree. As power increases, a node's own degree becomes more important. The strength of this relationship is modulated by network structure, being more pronounced in narrow, dense networks typical of social networking and weakening in broader, looser association networks such as the Internet. The expected force can be computed independently for individual nodes, making it applicable for networks whose adjacency matrix is dynamic, not well specified, or overwhelmingly large.