- In 1936, R. A. Fisher suggested the first algorithm for pattern recognition (Fisher 1936).
- Aronszajn (1950) introduced the ‘Theory of Reproducing Kernels’.
- In 1957 Frank Rosenblatt invented a linear classifier called the perceptron (the simplest kind of feedforward neural network), see Rosenblatt (1962).
- Vapnik and Lerner (1963) introduce the Generalized Portrait algorithm (the algorithm implemented by support vector machines is a nonlinear generalization of the Generalized Portrait algorithm).
- Aizerman, Braverman and Rozonoer (1964) introduced the geometrical interpretation of the kernels as inner products in a feature space.
- Vapnik and Chervonenkis (1964) further develop the Generalized Portrait algorithm.
- Cover (1965) discussed large margin hyperplanes in the input space and also sparseness.
- Similar optimisation techniques were used in pattern recognition by Mangasarian (1965).
- The use of slack variables to overcome the problem of noise and nonseparability was introduced by Smith (1968).
- Duda and Hart (1973) discuss large margin hyperplanes in the input space.
- The field of ‘statistical learning theory’ began with Vapnik and Chervonenkis (1974) (in Russian).
- SVMs can be said to have started when statistical learning theory was developed further with Vapnik (1979) (in Russian).
- Wapnik and Tscherwonenkis (1979) wrote a German translation of Vapnik and Chervonenkis�s 1974 book.
- Vapnik (1982) wrote an English translation of his 1979 book.
- See also the PhD thesis by Hassoun (1986) for related early work.
- Several statistical mechanics papers (for example Anlauf and Biehl (1989)) suggested using large margin hyperplanes in the input space.
- Poggio and Girosi (1990) and Wahba (1990) discuss the use of kernels.
- Bennett and Mangasarian (1992) improved upon Smith’s 1968 work on slack variables.
- SVMs close to their current form were first introduced with a paper at the COLT 1992 conference (Boser, Guyon and Vapnik 1992).
- In 1995 the soft margin classifier was introduced by Cortes and Vapnik (1995); in the same year the algorithm was extended to the case of regression by Vapnik (1995) in
*The Nature of Statistical Learning Theory*. - The papers by Bartlett (1998) and Shawe-Taylor,
*et al.*(1998) gave the first rigorous statistical bound on the generalisation of hard margin SVMs. - Shawe-Taylor and Cristianini (2000) gave statistical bounds on the generalisation of soft margin algorithms and for the regression case.

