{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# SFA to Approximate Spectral Embeddings\n\nSlow Feature Analysis can be interpreted as a function approximation version\nof a spectral embedding, specifically Laplacian Eigenmaps (LEMs).\n\nAt their core, LEMs eigen-decompose a graph-derived matrix and use the eigenvectors\nas embedding. The graph is either given or generated from data, e.g., by connecting\nnearest neighbors.\n\nTo use standard SFA, a time-series is generated using this graph by performing a\nrandom walk on it. The actual time-series is the sequence of features of visited nodes,\ntheir embedding is the mapping learned by SFA.\n\nBenefits:\n\n* unseen points can be embedded without Nystr\u00f6m approximation.\n* instead of calculating the eigen-decomposition on the N-by-N Laplacian, only d-by-d covariance matrices need to be decomposed.\n* specific choice of the model family used for embedding.\n\nDownsides:\n\n* depends on randomly sub-sampled data.\n* found mapping might fail to extract structure due to limited model class (linear/extended linear).\n\n.. admonition:: Note\n\n   The described method works, but is not necessarily efficient. The not-yet-implemented\n   Graph-based version of SFA (GSFA) is generally the smarter choice and does not require\n   a random walk.\n\nBelow you see an example of random-walk SFA applied to a \"wavy circle\" dataset (N=1000) that clearly exhibits structure.\nA dimensionality reduction using PCA does not work, because the data has almost\nisotropic covariance. An embedding using LEMs leaves the global structure intact, but cannot naturally be applied to unseen points.\nSFA finds a linear mapping to the same end, while also generalizing embedding of unseen points.\n\nLEMs and the random-walk use a 50-nearest-neighbor embedding. The random-walk time-series consists of three sequences with\n500 steps each.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport scipy as sp\nimport matplotlib as mpl\nfrom mpl_toolkits.mplot3d import Axes3D\nimport matplotlib.pyplot as plt\nfrom sklearn.manifold import SpectralEmbedding as LEM\nfrom sklearn.preprocessing import PolynomialFeatures\nfrom sklearn.decomposition import PCA\nfrom sksfa import SFA\nfrom sksfa.utils import randomWalkFromCSC\n\nk = 0.125\nn_points = 1000\n\nu = np.linspace(0, 2 * np.pi / k, n_points)\nphase_offset = 0.25 * np.pi\nv =  k * u + phase_offset\n\n\ndata = np.zeros((n_points, 3))\nif True:\n    data[:, 0] = np.cos(v)\n    data[:, 1] = np.sin(v)\n    data[:, 2] = np.cos(u)\nelse:\n    a, b = 10, 5\n    data[:, 0] = (a + b * np.cos(u))*np.cos(v)\n    data[:, 1] = (a + b * np.cos(u))*np.sin(v)\n    data[:, 2] = b * np.sin(u)\n\nlem = LEM(2, n_neighbors=50)\nembedded = lem.fit_transform(data)\nA = lem.affinity_matrix_\n\nrestart_rate = 500\nn_random_samples = 1500\ntrajectory = randomWalkFromCSC(sp.sparse.csc_matrix(A), n_random_samples, restart_rate=restart_rate)\nwalk_data = data[trajectory]\n\nvisited = np.unique(trajectory)\nnon_visited = np.setdiff1d(np.arange(0, n_points), visited)\n\npf = PolynomialFeatures(1)\nsfa = SFA(2, batch_size=restart_rate if restart_rate > 0 else None)\nsf = sfa.fit_transform(pf.fit_transform(walk_data))\noos = sfa.transform(pf.transform(data[non_visited]))\n\npca = PCA(2)\npc = pca.fit_transform(data)\n\nfig = plt.figure()\nfig.set_size_inches(8, 12)\nfig.subplots_adjust(hspace=0.5)\n\nax_3d = fig.add_subplot(321, projection='3d')\nax_3d.set_title(\"Wavy circle data\")\nax_rw = fig.add_subplot(323, projection='3d')\nax_rw.set_title(\"Random walk samples\")\nax_lem = fig.add_subplot(322)\nax_lem.set_title(\"Spectral embedding\")\nax_sfa= fig.add_subplot(324)\nax_sfa.set_title(\"Linear SFA (in-sample)\")\nax_oos= fig.add_subplot(325)\nax_oos.set_title(\"Linear SFA (out-of-sample)\")\nax_pca= fig.add_subplot(326)\nax_pca.set_title(\"PCA embedding\")\n\nax_3d.scatter(data[:, 0], data[:, 1], data[:, 2], c=u, cmap=\"hsv\", s=2)\nax_rw.scatter(walk_data[:, 0], walk_data[:, 1], walk_data[:, 2], c=u[trajectory], cmap=\"hsv\", s=2)\n\nax_lem.scatter(embedded[:, 0], embedded[:, 1], c=u, cmap=\"hsv\", s=3)\nax_sfa.scatter(sf[:, 0], sf[:, 1], c=u[trajectory], cmap=\"hsv\", s=3)\nax_oos.scatter(oos[:, 0], oos[:, 1], c=u[non_visited], cmap=\"hsv\", s=3)\nax_pca.scatter(pc[:, 0], pc[:, 1], c=u, cmap=\"hsv\", s=3)\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}