Tag Archives: BigML api

BigML Python Bindings


BigML Python Bindings

BigML makes machine learning easy by taking care of the details required to add data-driven decisions and predictive power to your company. Unlike other machine learning services, BigML creates beautiful predictive models that can be easily understood and interacted with.

These BigML Python bindings allow you to interact with BigML.io, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and, predictions). For additional information, see the full documentation for the Python bindings on Read the Docs.

This module is licensed under the Apache License, Version 2.0.

Support

Please report problems and bugs to our BigML.io issue tracker.

Discussions about the different bindings take place in the general BigML mailing list. Or join us in ourCampfire chatroom.

Requirements

Python 2.6 and Python 2.7 are currently supported by these bindings.

The basic third-party dependencies are the requests, poster and unidecode libraries. These libraries are automatically installed during the setup.

The bindings will also use simplejson if you happen to have it installed, but that is optional: we fall back to Python’s built-in JSON libraries is simplejson is not found.

Additional numpy and scipy libraries are needed in case you want to use local predictions for regression models (including the error information) using proportional missing strategy. As these are quite heavy libraries and they are so seldom used, they are not included in the automatic installation dependencies. The test suite includes some tests that will need these libraries to be installed.

Installation

To install the latest stable release with pip:

$ pip install bigml

You can also install the development version of the bindings directly from the Git repository:

$ pip install -e git://github.com/bigmlcom/python.git#egg=bigml_python

Running the Tests

To run the tests you will need to install lettuce:

$ pip install lettuce

and set up your authentication via environment variables, as explained below. With that in place, you can run the test suite simply by:

$ cd tests
$ lettuce

Some tests need the numpy and scipy libraries to be installed too. They are not automatically installed as a dependency, as they are quite heavy and very seldom used.

Importing the module

To import the module:

import bigml.api

Alternatively you can just import the BigML class:

from bigml.api import BigML

Authentication

All the requests to BigML.io must be authenticated using your username and API key and are always transmitted over HTTPS.

This module will look for your username and API key in the environment variables BIGML_USERNAME andBIGML_API_KEY respectively. You can add the following lines to your .bashrc or .bash_profile to set those variables automatically when you log in:

export BIGML_USERNAME=myusername
export BIGML_API_KEY=ae579e7e53fb9abd646a6ff8aa99d4afe83ac291

With that environment set up, connecting to BigML is a breeze:

from bigml.api import BigML
api = BigML()

Otherwise, you can initialize directly when instantiating the BigML class as follows:

api = BigML('myusername', 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291')

Also, you can initialize the library to work in the Sandbox environment by passing the parameterdev_mode:

api = BigML(dev_mode=True)

Quick Start

Imagine that you want to use this csv file containing the Iris flower dataset to predict the species of a flower whose sepal length is 5 and whose sepal width is 2.5. A preview of the dataset is shown below. It has 4 numeric fields: sepal length, sepal width, petal length, petal width and a categorical field: species. By default, BigML considers the last field in the dataset as the objective field (i.e., the field that you want to generate predictions for).

sepal length,sepal width,petal length,petal width,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
...
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
...
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica

You can easily generate a prediction following these steps:

from bigml.api import BigML

api = BigML()

source = api.create_source('./data/iris.csv')
dataset = api.create_dataset(source)
model = api.create_model(dataset)
prediction = api.create_prediction(model, {'sepal length': 5, 'sepal width': 2.5})

You can then print the prediction using the pprint method:

>>> api.pprint(prediction)
species for {"sepal width": 2.5, "sepal length": 5} is Iris-virginica

Additional Information

We’ve just barely scratched the surface. For additional information, see the full documentation for the Python bindings on Read the Docs. Alternatively, the same documentation can be built from a local checkout of the source by installing Sphinx ($ pip install sphinx) and then running:

$ cd docs
$ make html

Then launch docs/_build/html/index.html in your browser.

How to Contribute

Please follow the next steps:

  1. Fork the project on github.com.
  2. Create a new branch.
  3. Commit changes to the new branch.
  4. Send a pull request.

For details on the underlying API, see the BigML API documentation.

BigML Node.js Bindings


BigML Node.js Bindings

BigML makes machine learning easy by taking care of the details required to add data-driven decisions and predictive power to your company. Unlike other machine learning services, BigML creates beautiful predictive models that can be easily understood and interacted with.

These BigML Node.js bindings allow you to interact with BigML.io, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and predictions).

This module is licensed under the Apache License, Version 2.0.

Support

Please report problems and bugs to our BigML.io issue tracker.

Discussions about the different bindings take place in the general BigML mailing list. Or join us in ourCampfire chatroom.

Requirements

Node 0.10 is currently supported by these bindings.

The only mandatory third-party dependencies are the request, winston and form-data libraries.

The testing environment requires the additional mocha package that can be installed with the following command:

$ sudo npm install -g mocha

Installation

To install the latest stable release with npm:

$ npm install bigml

You can also install the development version of the bindings by cloning the Git repository to your local computer and issuing:

$ npm install .

Testing

The test suite is run automatically using mocha as test framework. As all the tested api objects perform one or more connections to the remote resources in bigml.com, you may have to enlarge the default timeout used by mocha in each test. For instance:

$ mocha -t 20000

will set the timeout limit to 20 seconds. This limit should typically be enough, but you can change it to fit the latencies of your connection.

Importing the modules

To use the library, import it with require:

$ node
> bigml = require('bigml');

this will give you access to the following library structure:

- bigml.constants       common constants
- bigml.BigML           connection object
- bigml.Resource        common API methods
- bigml.Source          Source API methods
- bigml.Dataset         Dataset API methods
- bigml.Model           Model API methods
- bigml.Ensemble        Ensemble API methods
- bigml.Prediction      Prediction API methods
- bigml.Evaluation      Evaluation API methods
- bigml.LocalModel      Model for local predictions
- bigml.LocalEnsemble   Ensemble for local predictions

Authentication

All the requests to BigML.io must be authenticated using your username and API key and are always transmitted over HTTPS.

This module will look for your username and API key in the environment variables BIGML_USERNAME andBIGML_API_KEY respectively. You can add the following lines to your .bashrc or .bash_profile to set those variables automatically when you log in::

export BIGML_USERNAME=myusername
export BIGML_API_KEY=ae579e7e53fb9abd646a6ff8aa99d4afe83ac291

With that environment set up, connecting to BigML is a breeze::

connection = new bigml.BigML();

Otherwise, you can initialize directly when instantiating the BigML class as follows::

connection = new bigml.BigML('myusername',
                             'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291')

Also, you can initialize the library to work in the Sandbox environment by setting the third parameterdevMode to true::

connection = new bigml.BigML('myusername',
                             'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291',
                             true)

Quick Start

Let’s see the steps that will lead you from this csv file containing the Iris flower dataset to predicting the species of a flower whose sepal length is 5 and whose sepal width is 2.5. By default, BigML considers the last field (species) in the row as the objective field (i.e., the field that you want to generate predictions for). The csv structure is::

sepal length,sepal width,petal length,petal width,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
...

The steps required to generate a prediction are creating a set of source, dataset and model objects::

    var bigml = require('bigml');
    var source = new bigml.Source();
    source.create('./data/iris.csv', function(error, sourceInfo) {
      if (!error && sourceInfo) {
        var dataset = new bigml.Dataset();
        dataset.create(sourceInfo, function(error, datasetInfo) {
          if (!error && datasetInfo) {
            var model = new bigml.Model();
            model.create(datasetInfo, function (error, modelInfo) {
              if (!error && modelInfo) {
                var prediction = new bigml.Prediction();
                prediction.create(modelInfo, {'petal length': 1})
              }
            });
          }
        });
      }
    });

Note that in our example the prediction.create call has no associated callback. All the CRUD methods of any resource allow assigning a callback as the last parameter, but if you don’t the default action will be printing the resulting resource or the error. For the create method:

> result: 
{ code: 201,
  object: 
   { category: 0,
     code: 201,
     content_type: 'text/csv',
     created: '2013-06-08T15:22:36.834797',
     credits: 0,
     description: '',
     fields_meta: { count: 0, limit: 1000, offset: 0, total: 0 },
     file_name: 'iris.csv',
     md5: 'd1175c032e1042bec7f974c91e4a65ae',
     name: 'iris.csv',
     number_of_datasets: 0,
     number_of_ensembles: 0,
     number_of_models: 0,
     number_of_predictions: 0,
     private: true,
     resource: 'source/51b34c3c37203f4678000020',
     size: 4608,
     source_parser: {},
     status: 
      { code: 1,
        message: 'The request has been queued and will be processed soon' },
     subscription: false,
     tags: [],
     type: 0,
     updated: '2013-06-08T15:22:36.834844' },
  resource: 'source/51b34c3c37203f4678000020',
  location: 'https://localhost:1026/andromeda/source/51b34c3c37203f4678000020',
  error: null }

The generated objects can be retrieved, updated and deleted through the corresponding REST methods. For instance, in the previous example you would use:

    bigml = require('bigml');
    var source = new bigml.Source();
    source.get('source/51b25fb237203f4410000010' function (error, resource) {
        if (!error && resource) {
          console.log(resource);
        }
      })

to recover and show the source information.

You can also generate local predictions using the information of your models::

    bigml = require('bigml');
    var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
    localModel.predict({'petal length': 1},
                       function(error, prediction) {console.log(prediction)});

And similarly, for your ensembles

    bigml = require('bigml');
    var localEnsemble = new bigml.LocalEnsemble('ensemble/51901f4337203f3a9a000215');
    localEnsemble.predict({'petal length': 1}, 0, 
                          function(error, prediction) {console.log(prediction)});

will generate a prediction by combining the predictions of each of the models they enclose. The example uses the plurality combination method (whose code is 0. Check the docs for more information about the available combination methods).

Additional Information

We’ve just drawn a first sketch. For additional information, see the files included in the docs folder.

How to Contribute

Please follow these steps:

  1. Fork the project on github.com.
  2. Create a new branch.
  3. Commit changes to the new branch.
  4. Send a pull request.

For details on the underlying API, see the BigML API documentation.

Simple R bindings for BigML.io


This repo contains the source code used to generate the BigML api bindings for R.

Please, report problems and bugs to our BigML.io issue tracker

Discussions about the different bindings take place in the general BigML mailing list. Or join us in ourCampfire chatroom

Build

There is a small bundle.R script that will build a CRAN-ready bundle. Roxygen2 is necessary for building the package documentation. Simply run the script in R while in the project directory.

Tests

There is a small unit test script called run_tests.R.

These tests compare the class structure of bigml responses to the expected class structure (in JSON form). It will also check for specific known responses (e.g., A particular response for a prediction request on a model trained from the iris dataset). Simply evaluate the code in run_tests.R in the project root in order to run the tests.

It requires the testthat library by Hadley Wickham. Simply run the script in R while in the project directory

It is necessary to run setCredentials() beforehand, or to set BIGML_USERNAME and BIGML_API_KEYappropriately in your .Renviron file.

There are some small utilities (misc.R) that make it easier to manipulate the complex datastructures returned by R and the bigml API.