----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames, colnames<-, ifelse, is.character, is.factor,
    is.numeric, log, log10, log1p, log2, round, signif, trunc

Loading required package: lubridate

Attaching package: ‘lubridate’

The following objects are masked from ‘package:h2o’:

    day, hour, month, week, year

The following object is masked from ‘package:base’:

    date

Loading required package: PerformanceAnalytics
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric


Package PerformanceAnalytics (1.5.2) loaded.
Copyright (c) 2004-2018 Peter Carl and Brian G. Peterson, GPL-2 | GPL-3
https://github.com/braverock/PerformanceAnalytics


Attaching package: ‘PerformanceAnalytics’

The following object is masked from ‘package:graphics’:

    legend

Loading required package: quantmod
Loading required package: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.
Learn from a quantmod author: https://www.datacamp.com/courses/importing-and-managing-financial-data-in-r
Loading required package: tidyverse
── Attaching packages ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.0     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.2     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ lubridate::as.difftime() masks base::as.difftime()
✖ lubridate::date()        masks base::date()
✖ lubridate::day()         masks h2o::day()
✖ dplyr::filter()          masks stats::filter()
✖ dplyr::first()           masks xts::first()
✖ lubridate::hour()        masks h2o::hour()
✖ lubridate::intersect()   masks base::intersect()
✖ dplyr::lag()             masks stats::lag()
✖ dplyr::last()            masks xts::last()
✖ lubridate::month()       masks h2o::month()
✖ lubridate::setdiff()     masks base::setdiff()
✖ lubridate::union()       masks base::union()
✖ lubridate::week()        masks h2o::week()
✖ lubridate::year()        masks h2o::year()
ds_test_data %>% glimpse()
Observations: 1,690
Variables: 8
$ organization_id       <dbl> 67, 80, 588, 1005, 1098, 1216, 1230, 1339, 1384, 1692, 1704, 1744, 1822, 1862, 1895, ...
$ feature_adoption_rate <dbl> 0.00, NA, 0.67, 0.48, 0.67, 0.58, 0.73, 0.00, 0.15, 0.73, NA, 0.48, 0.58, 0.64, 0.70,...
$ owner_operator        <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0...
$ jobs_completed        <dbl> 0, 194, 81, 50, 8, 71, 45, 0, 0, 238, 43, 6, 24, 308, 106, 0, 0, 106, 151, 35, 22, 0,...
$ cc_rate               <dbl> 2.90, 2.69, 2.69, 2.69, 2.69, 2.69, 2.90, 2.69, 2.69, 2.69, 2.90, 2.90, 2.90, 2.69, 2...
$ plan_tier             <chr> "starter", "small", "medium", "small", "small", "small", "small", "small", "small", "...
$ vertical              <chr> "Other", "Plumbing", "Other", "Heating & Air Conditioning", "Other", "Carpet Cleaning...
$ cc                    <dbl> 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1...
ds_test_data_clean <- ds_test_data %>%
  select(-organization_id) %>% 
  select_if(~ !is.Date(.)) %>%
  select_if(~ !any(is.na(.))) %>%
  mutate_if(is.ordered, ~ as.character(.) %>% as.factor) 
ds_test_data_clean
# change cc to factor
ds_test_data <- ds_test_data_clean %>% 
  mutate(cc = as.factor(cc), vertical = as.factor(vertical), plan_tier = as.factor(plan_tier))
# Split into training, validation and test sets
## 75% of the sample size for train, 12.5% for validation & test
train_size <- floor(0.75 * nrow(ds_test_data))
valid_size <- floor(.50 * (nrow(ds_test_data)-train_size))
## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ds_test_data)), size = train_size)
train_tbl <- ds_test_data[train_ind, ]
valid_ind <- sample(seq_len(nrow(ds_test_data[-train_ind, ])), size = valid_size)
valid_tbl <- ds_test_data[valid_ind, ]
test_ind <- sample(seq_len(nrow(ds_test_data[c(-train_ind,-valid_ind), ])), size = valid_size)
test_tbl <- ds_test_data[test_ind, ]
valid_no_test <- ds_test_data[-train_ind, ]
h2o.init()        # Fire up h2o

H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /var/folders/wx/3_s_5dj14l9c1qtmg26zks700000gn/T//RtmpPN49zM/h2o_superjohn_started_from_r.out
    /var/folders/wx/3_s_5dj14l9c1qtmg26zks700000gn/T//RtmpPN49zM/h2o_superjohn_started_from_r.err
java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)

Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 621 milliseconds 
    H2O cluster timezone:       America/Los_Angeles 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.20.0.8 
    H2O cluster version age:    2 months and 4 days  
    H2O cluster name:           H2O_started_from_R_superjohn_ymc288 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   4.00 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
    R Version:                  R version 3.5.0 (2018-04-23) 
h2o.no_progress() # Turn off progress bars
# Convert to H2OFrame objects
train_h2o <- as.h2o(train_tbl)
valid_h2o <- as.h2o(valid_tbl)
test_h2o  <- as.h2o(test_tbl)
valid_no_test_h20 <- as.h2o(valid_no_test)
# Set names for h2o
y <- "cc"
x <- setdiff(names(train_h2o), y)
# linear regression model used, but can use any model
automl_models_h2o <- h2o.automl(
  project_name = "ds_test_models",
  x = x, 
  y = y, 
  training_frame = train_h2o, 
  validation_frame = valid_h2o, 
  leaderboard_frame = test_h2o, 
  max_runtime_secs = 60, 
  stopping_metric = "AUC"
  , sort_metric = "AUC")
# Extract leader model
automl_leader <- automl_models_h2o@leader
# Get Results
pred_h2o <- h2o.predict(automl_leader, test_h2o)
h2o.performance(automl_leader, test_h2o)
H2OBinomialMetrics: drf

MSE:  0.1226938
RMSE:  0.3502767
LogLoss:  0.4072412
Mean Per-Class Error:  0.1614379
AUC:  0.9000934
Gini:  0.8001867

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:

Maximum Metrics: Maximum metrics at their respective thresholds

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.varimp(automl_leader)
Variable Importances: 
h2o.varimp_plot(automl_leader, 20)

h2o.download_mojo(automl_leader, "~/Downloads/", FALSE)
[1] "DRF_0_AutoML_20181126_124912.zip"
h2o.partialPlot(automl_leader, data = train_h2o, cols = "jobs_completed")
PartialDependence: Partial Dependence Plot of model DRF_0_AutoML_20181126_124912 on column 'jobs_completed'