Machine learning toolbox
For many years, the programming language of choice for machine learning was one of the following: Python, R, MATLAB, C++. This is not due to some specific language features, but because of the infrastructure around it: libraries and tools. Swift is a relatively young programming language, and anyone who chooses it as a primary tool for machine learning development should start from the very basic building blocks, and build his own tools and libraries. Recently, Apple became more open to third-party Python machine learning tools: Core ML can work with some of them.
Here is a list of components that are needed for the successful machine learning research and development, and examples of popular libraries and tools of the type:
- Linear algebra: Machine learning developer needs data structures like vectors, matrices, and tensors with compact syntax and hardware-accelerated operations on them. Examples in other languages: NumPy, MATLAB, and R standard libraries, Torch.
- Probability theory: All kinds of random data generation: random numbers and collections of them; probability distributions; permutations; shuffling of collections, weighted sampling, and so on. Examples: NumPy, and R standard library.
- Data input-output: In machine learning, we are usually most interested in the parsing and saving data in the following formats: plain text, tabular files like CSV, databases like SQL, internet formats JSON, XML, HTML, and web scraping. There are also a lot of domain-specific formats.
- Data wrangling: Table-like data structures, data engineering tools: dataset cleaning, querying, splitting, merging, shuffling, and so on. Pandas, dplyr.
- Data analysis/statistic: Descriptive statistic, hypotheses testing and all kinds of statistical stuff. R standard library, and a lot of CRAN packages.
- Visualization: Statistical data visualization (not pie charts): graph visualization, histograms, mosaic plots, heat maps, dendrograms, 3D-surfaces, spatial and multidimensional data visualization, interactive visualization, Matplotlib, Seaborn, Bokeh, ggplot2, ggmap, Graphviz, D3.js.
- Symbolic computations: Automatic differentiation: SymPy, Theano, Autograd.
- Machine learning packages: Machine learning algorithms and solvers. Scikit-learn, Keras, XGBoost, E1071, and caret.
- Interactive prototyping environment: Jupyter, R studio, MATLAB, and iTorch.
This is not referring to domain-specific tools, like NLP, or computer vision libraries.
As for summer 2017, I'm not aware of Swift alternatives of comparable quality and functionality to any of the mentioned tools. Also, none of these popular libraries are directly compatible with Swift, meaning you can't call Keras from your iOS Swift code. All this means that Swift cannot be the primary tool for machine learning research and development. Killing Python is not on Swift's agenda so far; however, to a different degree, there are some compatible libraries and tools, which using a wide scope of machine learning problems can be addressed in your Swift applications. In the following chapters, we're building our own tools, or introducing third-party tools as we need them. We are talking about machine learning libraries specifically in Chapter 10, Natural Language Processing. Still, for anyone who wants to work with machine learning, it's more than advisable to know well at least one from this list: Python, R, and MATLAB.