A vector processor, or array processor, is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector machines appeared in the early 1970s and dominated supercomputer design through the 1970s into the 90s, notably the various Cray platforms. The rapid fall in the price-to-performance ratio of conventional microprocessor designs led to the vector supercomputer's demise in the later 1990s.
Today, most commodity CPUs implement architectures that feature instructions for a form of vector processing on multiple (vectorized) data sets, typically known as SIMD (Single Instruction, Multiple Data). Common examples include VIS, MMX, SSE, AltiVec and AVX. Vector processing techniques are also found in video game console hardware and graphics accelerators. In 2000, IBM, Toshiba and Sony collaborated to create the Cell processor, consisting of one scalar processor and eight vector processors, which found use in the Sony PlayStation 3 among other applications.
Other CPU designs may include some multiple instructions for vector processing on multiple (vectorised) data sets, typically known as MIMD (Multiple Instruction, Multiple Data) and realized with VLIW. Such designs are usually dedicated to a particular application and not commonly marketed for general purpose computing. In the Fujitsu FR-V VLIW/vector processor both technologies are combined.
Many ARM processors include vector or Single Instruction Multiple Data (SIMD) instructions. These enable the processor to perform multiple operations with a single instruction.
Vector processing works by processing multiple operations in parallel with a single instruction. The number and type of operations you can do depends on the type of vector processor extension in your processor. For example, an ARM processor with the NEON Media Processing Engine can do up to 4 32bit operations, 8 16-bit operations, or 16 8-bit operations simultaneously, depending on the implementation.
Using vector instructions can produce a very large performance boost for some operations. Use vector processing where possible. This increases performance and reduces code size making it more cacheable.
You can sometime use vector instruction in a loop as a form of loop unrolling. This can reduce the number of total iterations the loop must do by 4 or more times.