Knight Capital Group

Jersey City, NJ, United States

Knight Capital Group

Jersey City, NJ, United States
Time filter
Source Type

Yu F.,Knight Capital Group | Ko K.-I.,National Chiao Tung University | Ko K.-I.,King Abdulaziz University
Theoretical Computer Science | Year: 2013

In this paper, we study the parallel complexity of analytic functions. We investigate the complexity of computing the derivatives, integrals, and zeros of NC or logarithmic-space computable analytic functions, where NC denotes the complexity class of sets acceptable by polynomial-size, polylogarithmic-depth, uniform Boolean circuits. It is shown that the derivatives and integrals of NC (or logarithmic-space) computable analytic functions remain NC (or, respectively, logarithmic-space) computable. We also study the problem of finding all zeros of an NC computable analytic function inside an NC computable Jordan curve, and show that, under a uniformity condition on the function values on the Jordan curve, all zeros can be found in NC. © 2013 Elsevier B.V. All rights reserved.

Bordewich M.,Durham University | Mihaescu R.,Knight Capital Group
IEEE/ACM Transactions on Computational Biology and Bioinformatics | Year: 2013

Distance-based phylogenetic methods attempt to reconstruct an accurate phylogenetic tree from an estimated matrix of pairwise distances between taxa. This paper examines two distance-based algorithms (GreedyBME and FastME) that are based on the principle of minimizing the balanced minimum evolution score of the output tree in relation to the given estimated distance matrix. This is also the principle that underlies the neighbor-joining (NJ) algorithm. We show that GreedyBME and FastME both reconstruct the entire correct tree if the input data are quartet consistent, and also that if the maximum error of any distance estimate is, then both algorithms output trees containing all sufficiently long edges of the true tree: those having length at least 3ε. That is to say, the algorithms have edge safety radius 1/3. In contrast, quartet consistency of the data is not sufficient to guarantee the NJ algorithm reconstructs the correct tree, and moreover, the NJ algorithm has edge safety radius of 1/4: Only edges of the true tree of length at least 4ε can be guaranteed to appear in the output. These results give further theoretical support to the experimental evidence suggesting FastME is a more suitable distance-based phylogeny reconstruction method than the NJ algorithm. © 2004-2012 IEEE.

Khandekar R.,Knight Capital Group | Kortsarz G.,Rutgers University | Mirrokni V.,Google
Algorithmica | Year: 2014

Graph clustering is an important problem with applications to bioinformatics, community discovery in social networks, distributed computing, and more. While most of the research in this area has focused on clustering using disjoint clusters, many real datasets have inherently overlapping clusters. We compare overlapping and non-overlapping clusterings in graphs in the context of minimizing their conductance. It is known that allowing clusters to overlap gives better results in practice. We prove that overlapping clustering may be significantly better than non-overlapping clustering with respect to conductance, even in a theoretical setting. For minimizing the maximum conductance over the clusters, we give examples demonstrating that allowing overlaps can yield significantly better clusterings, namely, one that has much smaller optimum. In addition for the min-max variant, the overlapping version admits a simple approximation algorithm, while our algorithm for the non-overlapping version is complex and yields a worse approximation ratio due to the presence of the additional constraint. Somewhat surprisingly, for the problem of minimizing the sum of conductances, we found out that allowing overlap does not help. We show how to apply a general technique to transform any overlapping clustering into a non-overlapping one with only a modest increase in the sum of conductances. This uncrossing technique is of independent interest and may find further applications in the future. We consider this work as a step toward rigorous comparison of overlapping and non-overlapping clusterings and hope that it stimulates further research in this area. © 2013 Springer Science+Business Media New York.

Yu F.,Knight Capital Group | Ko K.-I.,State University of New York at Stony Brook | Ko K.-I.,King Abdulaziz University
Theoretical Computer Science | Year: 2013

We study, in this paper, the relationship among the classes of logarithmic-space computable real numbers under different representations. We consider logarithmic-space computable real numbers under the Cauchy function representation, the general left cut representation, the standard left cut representation, and the binary expansion representation. It is shown that the relationship among these classes of real numbers depends on the relationship between the discrete complexity classes P1 and L1, the classes of tally sets in P and L, respectively. First, if P1=L 1, then the relationship among the four classes of logarithmic-space computable real numbers is the same as that among these classes of polynomial-time computable real numbers. On the other hand, if P 1≠L1, then we get different relationships from those among classes of polynomial-time computable real numbers. For instance, while the classes of polynomial-time computable real numbers under the general left cut and the Cauchy function representations are equivalent, we show, under the assumption of P1≠L1, that the class of logarithmic-space computable real numbers under the general left cut representation properly contains the class of logarithmic-space computable real numbers under the Cauchy function representation. In addition, if P 1≠L1, then the two classes of logarithmic-space computable real numbers under the standard left cut and the Cauchy function representations are incomparable. © 2012 Elsevier B.V. All rights reserved.

Hajiaghayi M.,University of Maryland University College | Khandekar R.,Knight Capital Group | Kortsarz G.,Rutgers University | Nutov Z.,Open University of Israel
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

In the Fixed Cost k -Flow problem, we are given a graph G = (V,E) with edge-capacities {u e |e ∈E} and edge-costs {c e |e ∈E}, source-sink pair s,t ∈ V, and an integer k. The goal is to find a minimum cost subgraph H of G such that the minimum capacity of an st-cut in H is at least k. We show that Group Steiner is a special case of Fixed Cost k -Flow, thus obtaining the first polylogarithmic lower bound for the problem; this also implies the first non constant lower bounds for the Capacitated Steiner Network and Capacitated Multicommodity Flow problems. We then consider two special cases of Fixed Cost k -Flow. In the Bipartite Fixed-Cost k -Flow problem, we are given a bipartite graph G = (A ∪ B,E) and an integer k > 0. The goal is to find a node subset S ⊆ A ∪ B of minimum size |S| such G has k pairwise edge-disjoint paths between S ∩ A and S ∩ B. We give an O(√k log k) approximation for this problem. We also show that we can compute a solution of optimum size with Ω(k/polylog(n)) paths, where n = |A| + |B|. In the Generalized-P2P problem we are given an undirected graph G = (V,E) with edge-costs and integer charges {b v :v ∈ V}. The goal is to find a minimum-cost spanning subgraph H of G such that every connected component of H has non-negative charge. This problem originated in a practical project for shift design [10]. Besides that, it generalizes many problems such as Steiner Forest, k -Steiner Tree, and Point to Point Connection. We give a logarithmic approximation algorithm for this problem. Finally, we consider a related problem called Connected Rent or Buy Multicommodity Flow and give a log3+ε n approximation scheme for it using Group Steiner techniques. © 2014 Springer International Publishing.

Hajiaghayi M.,University of Maryland University College | Khandekar R.,Knight Capital Group | Khani M.R.,University of Maryland University College | Kortsarz G.,Rutgers University
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

In the Movement Repairmen (MR) problem we are given a metric space (V, d) along with a set R of k repairmen r1, r2,...,rk with their start depots s1, s2,...,sk ∈ V and speeds v1, v2,...,vk ≥ 0 respectively and a set C of m clients c1, c2,...,cm having start locations s1′, s2′,...,s m′ ∈ V and speeds v1′, v 2′,...,vm′ ≥ 0 respectively. If t is the earliest time a client cj is collocated with any repairman (say, ri) at a node u, we say that the client is served by ri at u and that its latency is t. The objective in the (Sum-MR) problem is to plan the movements for all repairmen and clients to minimize the sum (average) of the clients latencies. The motivation for this problem comes, for example, from Amazon Locker Delivery [Ama10] and USPS gopost [Ser10]. We give the first O(log n)-approximation algorithm for the Sum-MR problem. In order to solve Sum-MR we formulate an LP for the problem and bound its integrality gap. Our LP has exponentially many variables, therefore we need a separation oracle for the dual LP. This separation oracle is an instance of Neighborhood Prize Collecting Steiner Tree (NPCST) problem in which we want to find a tree with weight at most L collecting the maximum profit from the clients by visiting at least one node from their neighborhoods. The NPCST problem, even with the possibility to violate both the tree weight and neighborhood radii, is still very hard to approximate. We deal with this difficulty by using LP with geometrically increasing segments of the time line, and by giving a tricriteria approximation for the problem. The rounding needs a relatively involved analysis. We give a constant approximation algorithm for Sum-MR in Euclidean Space where the speed of the clients differ by a constant factor. We also give a constant approximation for the makespan variant. © 2013 Springer-Verlag.

Manshadi F.M.,Max Planck Institute for Informatics | Awerbuch B.,Johns Hopkins University | Gemulla R.,Max Planck Institute for Informatics | Khandekar R.,Knight Capital Group | And 2 more authors.
Proceedings of the VLDB Endowment | Year: 2013

Generalized matching problems arise in a number of applications, including computational advertising, recommender systems, and trade markets. Consider, for example, the problem of recommending multimedia items (e.g., DVDs) to users such that (1) users are recommended items that they are likely to be interested in, (2) every user gets neither too few nor too many recommendations, and (3) only items available in stock are recommended to users. State-of-the-art matching algorithms fail at coping with large real-world instances, which may involve millions of users and items. We propose the first distributed algorithm for computing near-optimal solutions to large-scale generalized matching problems like the one above. Our algorithm is designed to run on a small cluster of commodity nodes (or in a MapReduce environment), has strong approximation guarantees, and requires only a poly-logarithmic number of passes over the input. In particular, we propose a novel distributed algorithm to approximately solve mixed packing-covering linear programs, which include but are not limited to generalized matching problems. Experiments on real-world and synthetic data suggest that a practical variant of our algorithm scales to very large problem sizes and can be orders of magnitude faster than alternative approaches.

Wolf J.,IBM | Balmin A.,IBM | Rajan D.,Lawrence Livermore National Laboratory | Hildrum K.,IBM | And 4 more authors.
VLDB Journal | Year: 2012

We consider MapReduce clusters designed to support multiple concurrent jobs, concentrating on environments in which the number of distinct datasets is modest relative to the number of jobs. In such scenarios, many individual datasets are likely to be scanned concurrently by multiple Map phase jobs. As has been noticed previously, this scenario provides an opportunity for Map phase jobs to cooperate, sharing the scans of these datasets, and thus reducing the costs of such scans. Our paper has three main contributions over previous work. First, we present a novel and highly general method for sharing scans and thus amortizing their costs. This concept, which we call cyclic piggybacking, has a number of advantages over the more traditional batching scheme described in the literature. Second, we notice that the various subjobs generated in this manner can be assumed in an optimal schedule to respect a natural chain precedence ordering. Third, we describe a significant but natural generalization of the recently introduced FLEX scheduler for optimizing schedules within the context of this cyclic piggybacking paradigm, which can be tailored to a variety of cost metrics. Such cost metrics include average response time, average stretch, and any minimax-type metric-a total of 11 separate and standard metrics in all. Moreover, most of this carries over in the more general case of overlapping rather than identical datasets as well, employing what we will call semi-shared scans. In such scenarios, chain precedence is replaced by arbitrary precedence, but we can still handle 8 of the original 11 metrics. The overall approach, including both cyclic piggybacking and the FLEX scheduling generalization, is called CIRCUMFLEX. We describe some practical implementation strategies. And we evaluate the performance of CIRCUMFLEX via a variety of simulation and real benchmark experiments. © 2012 Springer-Verlag.

Khandekar R.,Knight Capital Group | Schieber B.,IBM | Shachnai H.,Technion - Israel Institute of Technology | Tamir T.,The Interdisciplinary Center
Journal of Scheduling | Year: 2015

Consider the following scheduling problem. We are given a set of jobs, each having a release time, a due date, a processing time, and demand for machine capacity. The goal is to schedule all jobs non-preemptively in their release-time deadline windows on machines that can process multiple jobs simultaneously, subject to machine capacity constraints, with the objective to minimize the total busy time of the machines. Our problem naturally arises in power-aware scheduling, optical network design, and customer service systems, among others. The problem is APX-hard by a simple reduction from the subset sum problem. A main result of this paper is a 5-approximation algorithm for general instances. While the algorithm is simple, its analysis involves a non-trivial charging scheme which bounds the total busy time in terms of work and span lower bounds on the optimum. This improves and extends the results of Flammini et al. (Theor Comput Sci 411(40–42):3553–3562, 2010). We extend this approximation to the case of moldable jobs, where the algorithm also needs to choose, for each job, one of several processing-time versus demand configurations. Better bounds and exact algorithms are derived for several special cases, including proper interval graphs, intervals forming a clique and laminar families of intervals. © 2014, Springer Science+Business Media New York.

Chen H.,Purdue University | Chen H.,Knight Capital Group | Li N.,Purdue University | Gates C.S.,Purdue University | And 2 more authors.
Proceedings of ACM Symposium on Access Control Models and Technologies, SACMAT | Year: 2010

An operating system relies heavily on its access control mechanisms to defend against local and remote attacks. The complexities of modern access control mechanisms and the scale of possible configurations are often overwhelming to system administrators and software developers. Therefore mis-configurations are very common and the security consequences are serious. Given the popularity and uniqueness of Microsoft Windows systems, it is critical to have a tool to comprehensively examine the access control configurations. However, current studies on Windows access control mechanisms are mostly based on known attack patterns. We propose a tool, WACCA, to systematically analyze the Windows configurations. Given the attacker's initial abilities and goals,WACCA generates an attack graph based on interaction rules. The tool then automatically generates attack patterns from the attack graph. Each attack pattern represents attacks of the same nature. The attack subgraphs and instances are also generated for each pattern. Compared to existing solutions,WACCA is more comprehensive and does not rely on manually defined attack patterns. It also has a unique feature in that it models software vulnerabilities and therefore can find attacks that rely on exploiting these vulnerabilities. We study two attack cases on a Windows Vista host and discuss the analysis results.

Loading Knight Capital Group collaborators
Loading Knight Capital Group collaborators