Nvidia Inc.

Santa Clara, CA, United States

Nvidia Inc.

Santa Clara, CA, United States
Time filter
Source Type

Badr Y.,University of California at Los Angeles | Ma K.-W.,Nvidia Inc. | Gupta P.,University of California at Los Angeles
Journal of Micro/ Nanolithography, MEMS, and MOEMS | Year: 2014

With the use of subwavelength photolithography, some layouts can have low printability and, accordingly, low yield due to the existence of bad patterns even though they pass design rule checks. A reasonable approach is to select some of the candidate bad patterns as forbidden. These are the ones with a high yield impact or low routability impact, and these are to be prohibited in the design phase. The rest of the candidate bad patterns may be fixed in the postroute stage in a best-effort manner. The process developers need to optimize the process to be friendly to the patterns of high routability impact. Hence, an evaluation method is required early in the process to assess the impact of forbidding layout patterns on routability. We propose pattern-driven design rule evaluation (pattern-DRE), which can be used to evaluate the importance of patterns for the routability of the standard cells and, accordingly, select the set of bad patterns to forbid in the design. The framework can also be used to compare restrictive patterning technologies [e.g., litho-etch-litho-etch (LELE), self-aligned double patterning (SADP), self-aligned quadruple patterning (SAQP), self-aligned octuple patterning (SAOP)]. Given a set of design rules and a set of forbidden patterns, pattern-DRE generates a set of virtual standard cells; then it finds the possible routing options for each cell without using any of the forbidden patterns. Finally, it reports the routability metrics. We present a few studies that illustrate the use cases of the framework. The first study compares LELE to SADP by using a set of forbidden patterns that are allowed by LELE but not by SADP. Another study compares LELE to extreme ultraviolet lithography from the routability aspect by prohibiting patterns that have LELE native conflicts. In addition, we present a study that investigates the effect of placing the active area of the transistors close to the P/N interface instead of close to the power rails. © 2014 Society of Photo-Optical Instrumentation Engineers (SPIE). © 2014 SPIE.

Butt S.,Rutgers University | Butt S.,Nvidia Inc. | Ganapathy V.,Rutgers University | Srivastava A.,AT and T Labs Research
Proceedings of the 5th ACM Symposium on Cloud Computing, SOCC 2014 | Year: 2014

Self-service Cloud Computing (SSC) [7] is a recently-proposed model to improve the security and privacy of client data on public cloud platforms. It prevents cloud operators from snooping on or modifying client VMs and provides clients the flexibility to deploy security services, such as VM introspection tools, on their own VMs. SSC achieves these goals by modifying the hypervisor privilege model. This paper focuses on the unique challenges involved in building a control plane for an SSC-based cloud platform. The control plane is the layer that facilitates interaction between hosts in the cloud infrastructure as well as between the client and the cloud. We describe a number of novel features in SSC's control plane, such as its ability to allow specification of VM dependencies, flexible deployment of network middleboxes, and new VM migration protocols. We report on our design and implementation of SSC's control plane, and present experimental evaluation of services implemented atop the control plane. Copyright © 2014 by the Association for Computing Machinery, Inc. (ACM).

Shi X.,Google | Su F.,Nvidia Inc. | Peir J.-K.,University of Florida
Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS | Year: 2014

Maintaining hardware cache coherence on future CMPs becomes increasingly important and difficult as the number of cores keeps accelerating in mainstream multicore chips. The simple snooping-bus coherence scheme is not suitable due to its limited scalability. The sparse coherence directory approach may incur extra cache invalidations due to a topological mismatch between the coherence directory and the directories of all cache modules. In this paper, we propose an innovative CMP coherence directory that has three important properties. First, the directory has a simple set-associative design with small associativity. The number of directory entries matches the total number of cache blocks. Second, an augmented Directory Lookaside Table (DLT) allows blocks to be displaced from their primary sets in the coherence directory for alleviating hot-set conflicts. Third, to avoid expensive presence bits, each copy of a block along with the located core ID occupies a separate directory entry. Performance evaluations based on multithreaded and multi-programmed workloads demonstrate significant advantages of the proposed CMP directory over directories with traditional set-associative or skewed associative designs. © 2014 IEEE.

Patnaik A.,Missouri University of Science and Technology | Zhang Y.,Missouri University of Science and Technology | De S.,Missouri University of Science and Technology | Pommerenke D.,Missouri University of Science and Technology | And 2 more authors.
IEEE International Symposium on Electromagnetic Compatibility | Year: 2014

In a high-speed connector system, coupling to an adjacent cable-connector system is not uncommon. It is essential to understand and quantify this coupling path in order to mitigate the coupling. Though simulation based methods are widely used, such an approach is generally very time consuming and computationally resource hungry. A measurement based method for quantifying the EMI coupling path between a highspeed connector and an adjacent connector on the same board is presented. This is based on measured S-parameters for the mode conversion representing the coupling from the differential mode in one connector to the antenna mode current on the other connector-cable system. The method is validated on two test structures comparing estimated and measured radiated field emissions. © 2014 IEEE.

Zhang Y.,University of Pittsburgh | Yang J.,University of Pittsburgh | Li W.,University of Pittsburgh | Wang L.,Nanjing University | Jin L.,Nvidia Inc.
Journal of Network and Computer Applications | Year: 2010

Wireless sensor networks have recently emerged as a promising computing model for many civilian and military applications. Sensor nodes in such a network are subject to varying forms of attacks since they are left unattended after deployment. Compromised nodes can, for example, tamper with legitimate reports or inject false reports in order to either distract the user from reaching the right decision or deplete the precious energy of relay nodes. Most of the current designs take the en-network detection approach: misbehaved nodes are detected by their neighboring watchdog nodes; false reports are detected and dropped by trusted en-route relay nodes, etc. However en-network designs are insufficient to defend collaborative attacks when many compromised nodes collude with each other in the network. In this paper we propose COOL, a COmpromised nOde Locator for detecting and locating compromised nodes once they misbehave in the network. It is based on the observation that for a well-behaved sensor node, the set of outgoing messages should be equal to the set of incoming and locally generated or dropped messages. However, comparing the message sets for different nodes is not enough to identify attacks as their sanity is unknown. We exploit a proven collision-resilient hashing scheme, termed incremental hashing, to sign the incoming, outgoing and locally generated/dropped message sets. The hash values are then sent to the sink for trusted comparisons. We discuss how to securely collect these hash values and then confidently locate compromised nodes. The scheme can also be combined with existing en-route false report filtering schemes to achieve both early false report dropping and accurate compromised nodes isolation. Through identifying and excluding compromised nodes, the COOL protocol prevents further damages from these nodes and forms a reliable and energy-conserving sensor network. © 2009 Elsevier Ltd. All rights reserved.

Patel K.,Nvidia Inc. | Annavaram M.,University of Southern California | Pedram M.,University of Southern California
IEEE Transactions on Computers | Year: 2013

Due to prohibitive cost of data center setup and maintenance, many small-scale businesses rely on hosting centers to provide the cloud infrastructure to run their workloads. Hosting centers host services of the clients on their behalf and guarantee quality of service as defined by service level agreements (SLAs.) To reduce energy consumption and to maximize profit it is critical to optimally allocate resources to meet client SLAs. Optimal allocation is a nontrivial task due to 1) resource heterogeneity where energy consumption of a client task varies depending on the allocated resources 2) lack of energy proportionality where energy cost for a task varies based on server utilization. In this paper, we introduce a generalized Network Flow-based Resource Allocation framework, called NFRA, for energy minimization and profit maximization. NFRA provides a unified framework to model profit maximization under a wide range of SLAs. We will demonstrate the simplicity of this unified framework by deriving optimal resource allocations for three different SLAs. We derive workload demands and server energy consumption data from SPECWeb2009 benchmark results to demonstrate the efficiency of NFRA framework. © 1968-2012 IEEE.

Arora A.,Indian Institute of Technology Delhi | Harne M.,Nvidia Inc. | Sultan H.,Indian Institute of Technology Delhi | Bagaria A.,Indian Institute of Technology Delhi | Sarangi S.R.,Indian Institute of Technology Delhi
IEEE Transactions on Parallel and Distributed Systems | Year: 2015

NUCA caches have traditionally been proposed as a solution for mitigating wire delays, and delays introduced due to complex networks on chip. Traditional approaches have reported significant performance gains with intelligent block placement, location, replication, and migration schemes. In this paper, we propose a novel approach in this space, called FP-NUCA. It differs from conventional approaches, and relies on a novel method of co-designing the last level cache and the network on chip. We artificially constrain the communication pattern in the NUCA cache such that all the messages travel along a few predefined paths (fast paths) for each set of banks. We leverage this communication pattern by designing a new type of NOC router called the Freeze router, which augments a regular router by adding a layer of circuitry that gates the clock of the regular router when there is a fast path message waiting to be transmitted. Messages along the fast path do not require buffering, switching, or routing. We incorporate a bank predictor with our novel NOC for reducing the number of messages, and resultant energy consumption. We compare our performance with state of the art protocols, and report speedups of up to 31 percent (mean: 6.3 percent), and ED2 reduction up to 46 percent (mean: 10.4 percent) for a suite of Splash and Parsec benchmarks. We implement the Freeze router in VHDL and show that the additional fast path logic has minimal area and timing overheads. © 2014 IEEE.

Heyman T.,Nvidia Inc. | Smith D.,Nvidia Inc. | Mahajan Y.,Nvidia Inc. | Leong L.,Nvidia Inc. | Abu-Haimed H.,Atrenta Inc.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

This paper presents an application of formal methods to the verification of hardware power management modules. The property being verified is called Dominant Controllability and is a property of a netlist node and a subset of the inputs. The property holds if there exists an assignment to the subset of the inputs such that it sets the node to 0/1 regardless of the values at the rest of the inputs. Verification of power management modules in recent CPU and GPU designs includes hundreds of such properties. Two approaches are described for verifying such properties: netlist optmization and QBF solving. In the latter case, a QBF preprocessor is used, requiring partial model reconstruction. Each method can be used independently or combined into a third algorithm that heuristically selects a method based on its performance on a design. Experimental results for these methods are presented and discussed. © 2014 Springer International Publishing Switzerland.

Wang R.,Duke University | Bhaskaran B.,Nvidia Inc. | Natarajan K.,Nvidia Inc. | Abdollahian A.,Nvidia Inc. | And 3 more authors.
Proceedings of the IEEE VLSI Test Symposium | Year: 2016

We present a programmable method for shift-clock stagger assignment to reduce power supply noise during system-on-chip (SoC) testing. An SoC design is typically composed of several blocks and two neighboring blocks that share the same power rails should not be toggled at the same time during shift. Therefore, the proposed programmable method does not assign the same stagger value to neighboring blocks. The positions of all blocks are first analyzed and the shared boundary length between blocks is then calculated. Based on the position relationships between the blocks, a mathematical model is presented to derive optimal result for small-to-medium sized problems. For larger designs, a heuristic algorithm is proposed and evaluated. We present assignment results as well as power-analysis results and silicon data for industry designs to highlight the effectiveness of the proposed method. © 2016 IEEE.

Catania V.,University of Catania | Patti D.,University of Catania | Palesi M.,University of Catania | Spadaccini A.,University of Catania | Fazzino F.,Nvidia Inc.
WSEAS Transactions on Information Science and Applications | Year: 2014

Instruction-set Simulators (ISS) are commonly used in any computer architecture course as primary tools for supporting the teaching activity. Although there are several simulation platforms for educational purposes, the lack of an unified and integrated platform often forces educators to use a range of heterogeneous tools to cover the different topics of the syllabus. This paper presents EduMIPS64 a free, visual, and platform-independent MIPS64 Instruction-Set Simulator designed as a learning aid for topics like instruction pipelining, hazard detection and resolution, exception handling, interrupts, and memory hierarchies. Its dual execution mode - stand-alone application and web applet - allows for inclusion in distance learning courses. Copyright © 2014 - All Rights Reserved.

Loading Nvidia Inc. collaborators
Loading Nvidia Inc. collaborators