<div><div>Chapter 1: Introduction</div><div>Sets expectation that book describes SYCL 1.2.1 with Intel extensions, and that most extensions are proof points of features that should end up in a future version of SYCL. Overview notion of different accelerator architectures doing well on different workloads, and introduce accelerator archs (but don’t overdo the topic). Overview/level setting on parallelism and relevant terminology, language landscape, SYCL history.</div><div>• SYCL key feature overview (single source, C++, multi-accelerator) - intended to draw people in and show simple code</div><div>• Language versions and extensions covered by this book</div><div>• Mixed-architecture compute and modern architectures</div><div>• Classes of parallelism</div><div>• Accelerator programming landscape (OpenMP, CUDA, TBB, OpenACC, AMD HCC, Kokkos, RAJA)</div><div>• Evolution of SYCL</div><div><br></div><div>Chapter 2: Where code executes</div><div>Describes which parts of code run natively on CPU versus on "devices". Differentiate between accelerator devices and the "host device". Show more code to increase reader familiarity with program structure.</div><div>• Single source programming model</div><div>• Built-in device selectors</div><div>• Writing a custom device selector</div><div><br></div><div>Chapter 3: Data management and ordering the uses of data</div><div>Overview the primary ways that data is accessible by both host and device(s): USM and buffers. Introduce command groups as futures for execution, and concept of dependencies between nodes forming a DAG.</div><div>• Intro</div><div>• Unified Shared Memory</div><div>• Buffers</div><div>• DAG mechanism</div><div><br></div><div>Chapter 4: Expressing parallelism</div><div>The multiple alternative constructs for expressing parallelism are hard to comprehend from the spec, and for anyone without major parallel programming experience. This chapter must position the parallelism mechanisms relative to each other, and leave the reader with a conceptual understanding of each, plus an understand of how to use the most common forms.</div><div>• Parallelism within kernels</div><div>• Overview of language features for expressions of parallelism</div><div>• Basic data parallel kernels</div><div>• Explicit ND-Range kernels</div><div>• Hierarchical parallelism kernels</div><div>• Choosing a parallelism/coding style</div><div><br></div><div>Chapter 5: Error handling</div><div>SYCL uses C++-style error handling. This is different/more modern than people using OpenCL and CUDA are used to. This chapter must frame the differences, and provide samples from which readers can manage exceptions easily in their code.</div><div>• Exception-based</div><div>• Synchronous and asynchronous exceptions</div><div>• Strategies for error management</div><div>• Fallback queue mechanism</div><div><br></div><div>Chapter 6: USM in detail</div><div>USM is a key usability feature when porting code, from C++ for example. When mixed with differing hardware capabilities, the USM landscape isn’t trivial to understand. This key chapter must leave the reader with an understanding of USM on different hardware capabilities, what is guaranteed at each level, and how to write code with USM features.</div><div>• Usability</div><div>• Device capability levels</div><div>• Allocating memory</div><div>• Use of data in kernels</div><div>• Sharing of data between host and devices</div><div>• Data ownership and migration</div><div>• USM as a usability feature</div><div>• USM as a performance feature</div><div>• Relation to OpenCL SVM</div><div><br></div><div>Chapter 7: Buffers in detail</div><div>Buffers will be available on all hardware, and are an important feature for people writing code that doesn’t have pointer-based data structures, particularly when implicit dependence management is desired. This chapter must cover the more complex aspects of buffers in an accessible waym, including when data movement is triggered, sub-buffer dependencies, and advanced host/buffer synchronization (mutexes).</div><div>• Buffer construction</div><div>• Access modes (e.g. discard_write) and set_final_data</div><div>• Device accessors</div><div>• Host accessors</div><div>• Sub-buffers for finer grained DAG dependencies</div><div>• Explicit data motion</div><div>• Advanced buffer data sharing between device and host</div><div><br></div><div>Chapter 8: DAG scheduling in detail</div><div>Must describe the DAG mechanism from a high level, which the spec does not do. Must describe the in-order simplifications, and common gotchas that people hit with the DAG (e.g. read data before buffer destruction and therefore kernel execution).</div><div>• Queues</div><div>• Common gotchas with DAGs</div><div>• Synchronizing with the host program</div><div>• Manual dependency management</div><div><br></div><div>Chapter 9: Local memory and work-group barriers</div><div>• "Local" memory</div><div>• Managing "local" memory</div><div>• Work-group barriers</div><div><br></div><div>Chapter 10: Defining kernels</div><div>• Lambdas</div><div>• Functors</div><div>• OpenCL interop objects</div><div><br></div><div>Chapter 11: Vectors</div><div>• Vector data types</div><div>• Swizzles</div><div>• Mapping to hardware</div><div><br></div><div>Chapter 12: Device-specific extension mechanism</div><div>• TBD</div><div><br></div><div>Chapter 13: Programming for GPUs</div><div>• Use of sub-groups</div><div>• Device partitioning</div><div>• Data movement</div><div>• Images and samplers</div><div>• TBD</div><div><br></div><div>Chapter 14: Programming for CPUs</div><div>• Loop vectorization</div><div>• Use of sub-groups</div><div>• TBD</div><div><br></div><div>Chapter 15: Programming for FPGAs</div><div>• Pipes</div><div>• Memory controls</div><div>• Loop controls</div><div><br></div><div>Chapter 16: Address spaces and multi_ptr</div><div>• Address spaces</div><div>• The multi_ptr class</div><div>• Intefacing with external code</div><div><br></div><div>Chapter 17: Using libraries</div><div>• Linking to external code</div><div>• Exchanging data with libraries</div><div><br></div><div>Chapter 18: Working with OpenCL</div><div>• Interoperability</div><div>• Program objects</div><div>• Build options</div><div>• Using SPIR-V kernels</div><div><br></div><div>Chapter 19: Memory model and atomics</div><div>• The memory model</div><div>• Fences</div><div>• Buffer atomics</div><div>• USM atomics</div></div>