We are in a golden era of rapid accelerator architecture innovations driven by emerging applications where the pure pursuit of FLOPS in hardware design with generality and abstraction is being replaced with targeted specialization. Hardware accelerators have become one of the most critical functional blocks of modern heterogeneous computer architectures. These accelerators implement domain-specific functions for machine learning and inference, edge intelligence, secure computing, and so on to improve power efficiency and computation throughput by several orders of magnitude. At the same time, the existing design practice cannot keep up with the pace of innovations due to the exponentially increasing complexity and cost. In particular, verification of functional correctness has become a major factor in design complexity and cost, and the trend is worsening.
This calls for a design paradigm shift including raising the level of abstraction in hardware design to software, scalable and composable verification, and a new programming model amenable for rapid verification, etc. Recent work presented at MICRO/ISCA/FMCAD/PLDI/POPL and other sister conferences indicates a substantial number of researchers in the community with interests in the intersection of accelerator architecture, design, and verification.
This workshop intends to provide a forum to bring researchers from domain-specific languages and compilers, architecture, and formal methods communities with common interests at the intersection of accelerator architecture, design, and verification together to cross-pollinate ideas. We believe that only with interdisciplinary collaboration we can arrest the design complexity and cost trend to keep up with the pace of architectural innovations.
1:00-1:10pm (10mins)
|
Opening Remark
Zhiru Zhang, Cornell University
|
1:10-1:50pm (40mins)
|
Keynote: Formal successes & frontiers in XLS' High Level Synthesis toolchain
Chris Leary, Google
|
1:50-2:15pm (25mins)
|
Talk: Scalable Equivalence Checking for Behavioral Synthesis
Fei Xie, Portland State University
|
2:15-2:40pm (25mins)
|
Talk: Towards A Formally Verified FHE Accelerator Design
Jin Yang, Intel
|
2:40-3:05pm (25mins)
|
Talk: Can We Eliminate the Performance vs. Correctness Tradeoff When Writing Low-Level HPC Code?
Gilbert Bernstein, University of Washington
|
3:05-3:25pm (20mins)
|
Coffee Break
|
3:25-3:50pm (25mins)
|
Talk: Generalizing the ISA to the ILA: A Software/Hardware Interface for Accelerator-rich Platforms
Sharad Malik, Princeton University
|
3:50-4:15pm (25mins)
|
Talk: Scalable Assurance via Verifiable Hardware-Software Contracts
Caroline Trippel, Stanford University
|
4:15-5:00pm (45mins)
|
Panel Discussion: Correctness Assured Accelerator Design: a Holy Grail or an Engineering Feasibility
Moderator: Debjit Pal, University of Illinois at Chicago
Fei Xie, Portland State University
Jin Yang, Intel
Gilbert Bernstein, University of Washington
Sharad Malik, Princeton University
Caroline Trippel, Stanford University
|
Chris Leary
Senior Staff Software Engineer at Google
|
Keynote: Formal successes & frontiers in XLS' High Level Synthesis toolchain
Abstract:
In this talk we describe aspects of formal incorporated and explored in development of the XLS high level synthesis toolchain (github.com/google/xls). The XLS toolchain is used to design hardware blocks via a software-engineering style methodology, yet every operation in XLS' Intermediate Representation can be projected into SMTLib. This property has enabled project successes such as a proving unoptimized IR equivalent to optimized IR, HLS-to-gates Logical Equivalence Checking flow, formal verification of floating point units, and "concolic" test case generation. We'll touch on the promise of being "under the workload" in the XLS compiler, and thereby able to project the input program and annotations on it into formal models for domain-specific theorem proving. We'll also discuss the mixed promise and peril we observe for undefined behavior and exploiting user-specified assumptions. Additionally, we'll cover how this is all developed in open source using open theorem proving engines, and has tie-ins to the broader ecosystem of open hardware toolchains and PDKs that we're co-developing at Google.
|
Caroline Trippel
Assistant Professor at Stanford University
|
Invited talk: Scalable Assurance via Verifiable Hardware-Software Contracts
Abstract:
Hardware-software (HW-SW) contracts are critical for high-assurance computer systems design and an enabler for software design/analysis tools that find and repair hardware-related bugs in programs. E.g., memory consistency models (MCMs) define what values shared memory loads can return in a parallel program. Emerging security contracts define what program data is susceptible to leakage via hardware side-channels. However, these contracts and the analyses they support are useless if we cannot guarantee microarchitectural compliance, which is a ``grand challenge.'' Notably, some contracts are still evolving e.g., security contracts), making hardware compliance a moving target. Even for mature contracts, comprehensively verifying that a complex microarchitecture implements some abstract contract is a time-consuming endeavor involving teams of engineers, which typically requires resorting to incomplete proofs. Our work takes a radically different approach to the challenge above by synthesizing HW-SW contracts from RTL implementations. This talk will give an overview of our recent work in synthesizing formally proven-correct HW-SW contracts from RTL to support concurrency and security verification of (processor) microarchitectures.
|
Fei Xie
Professor at Portland State University
|
Invited talk: Scalable Equivalence Checking for Behavioral Synthesis
Abstract: In this talk, we present a scalable equivalence checking framework for behavioral synthesis. Behavioral synthesis entails application of a sequence of transformations to compile a high-level description of a hardware design (e.g., in C/C++/SystemC) into a register-transfer level (RTL) implementation. Our framework covers behavioral synthesis flows end-to-end, including compiler transformations, scheduling, and binding and code generation. This framework achieves its scalability through close integration with behavioral synthesis flows.
|
Gilbert Bernstein
Assistant Professor at the University of Washington
|
Invited talk: Can We Eliminate the Performance vs. Correctness Tradeoff When Writing Low-Level HPC Code?
Abstract: Most applications benefiting from accelerators (especially ML accelerators) rely on hand-optimized high-performance kernel libraries to get access to new hardware, and ensure a high level of performance. However, these kernel libraries are still written and optimized by hand, at great expense using low-level C and assembly code. This is because the performance engineers who write this code, (like the hardware designers on the other side of the ISA from them) require control over the design. What if we designed programming languages specially tailored to the needs of these programmers?
I will discuss two different “user-scheduled” languages we’ve built along these lines. (1) Exo is an imperative language which turns the compiler “inside out” by externalizing control of code optimization directly to the user, and by replacing hardware-specific backends (the compiler writers’ responsibility) with user-level libraries (the performance engineers’ responsibility). (2) ATL is a simple functional tensor language, which we have embedded in Coq. Rewrites of ATL programs thereby become lemmas, and user-scheduling directives become proof tactics. These languages match the performance of highly tuned linear algebra, neural net and image processing kernels by using formal verification machinery to expedite the existing optimization process of low-level software performance engineers. |
Jin Yang
Senior Principal Engineer at Intel Corporation
|
Invited talk: Towards A Formally Verified FHE Accelerator Design
Abstract: Correctness assurance needs to be a first principle in accelerator design, especially for security and privacy applications. We will provide an overview of our approach on designing a formally verified FHE (Fully Homomorphic Encryption) accelerator. It starts with developing a behavioral micro-architecture model of the accelerator and establishing its correctness against its ISA specification by combining modular formal verification with a correct-by-construction argument similar to compiler verification. Each micro-architecture module is then refined with algorithmic details optimized for hardware implementation through a sequence of semantic preserving transformations. Finally, the functional equivalence between the refined module and its manual RTL implementation is formally verified. As a proof-of-concept, an RTL model is also automatically generated from the behavioral model through high level synthesis (HLS).
|
Sharad Malik
George Van Ness Lothrop Professor in Engineering at Princeton University
|
Invited talk: Generalizing the ISA to the ILA: A Software/Hardware Interface for
Accelerator-rich Platforms
Abstract:
The Instruction-Set Architecture (ISA) has long served as the software/hardware interface for programmable processors. The ISA simultaneously serves as a specification for the hardware implementation, and as an abstraction of the hardware for software development. With the advent of multiprocessors, the memory consistency model (MCM) provided the software/hardware interface for processor interactions through shared memory. We are now in an era where accelerator-rich platforms are widely used to deliver the power-performance requirements of emerging applications. Unfortunately, there is no widely accepted software/hardware interface for these platforms - this has implications for both hardware and software development.
My group, in collaboration with others, has been working on the Instruction-Level Abstraction (ILA) as a software-hardware interface that generalizes the notion of ISAs to accelerators. The ILA model of an accelerator is a functional model that defines the response of the accelerator to commands at its interface. These commands serve as “instructions” for the accelerator. Further, we have developed the ILA-MCM model for how the operational ILA model can be integrated with an axiomatic memory consistency model for a detailed functional specification that includes accelerator-processor interactions through shared memory. In this talk I will describe the ILA model, and its application to different use cases for accelerator-rich platforms:
|
Time: October 1st 2022 (Saturday) Afternoon 1 pm - 5 pm CDT
Address: The Westin Chicago River North, 320 N Dearborn St, Chicago, IL 60654