h2oai/app-malicious-domains

Name: app-malicious-domains

Owner: H2O.ai

Description: Domain name classifier looking for good vs. possibly malicious providers

Created: 2016-03-08 21:02:49.0

Updated: 2018-05-04 16:53:28.0

Pushed: 2018-05-04 16:53:26.0

Homepage: null

Size: 51089

Language: HTML

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Building a Machine Learning Application with AWS Lambda

This example builds a machine learning application using AWS Lambda, which is an Amazon service that automatically manages compute resources for code that is request-driven. It simplifies the process of scaling microservices, eliminating the need to provision or manage servers. The front-end of the application is a web browser, while the backend is a Lambda function, with components that include a function handler, Jython code for feature munging, and an H2O model POJO. The front-end and back-end communicate via a REST endpoint.


The application classifies domain names as legitimate or malicious. Malicious domains earn their label by engaging in malicious activity, such as botnets, phishing, and malware hosting. In order to defeat security systems, attackers use domain names that are generated by algorithms. To detect domains which may be malicious, the app builds a model based on linguistic features that distinguish regular domains from those that are algorithmically generated.

| Legitimate domains | Malicious domains | |:——–|:————-| |h2o | zyxgifnjobqhzptuodmzov | | zen-cart | c3p4j7zdxexg1f2tuzk117wyzn | | fedoraforum | batdtrbtrikw |

The “Make Data Products” presentation given at the Silicon Valley Big Data Science meetup on March 17, 2016 references this repo.

Files

| Data | Offline | Front-end | Back-end | |———-|————————|—|—| | legit-dga_domains.csv | build.gradle | src/main/webapp/index.html | lib/h2o-genmodel.jar (downloaded) | | src/main/resources/words.txt | h2o-model.py|src/main/webapp/app.js | lib/aws-lambda-java-core-1.0.0.jar | | | | | lib/jython-standalone-2.7.0.jar | | | | | src/main/java/Classify.java | | | | | src/main/java/MaliciousDomainModel.java (generated) | | | | | src/main/resources/pymodule.py |

Steps to run
Step 1: Create the gradle wrapper to get a stable version of gradle.
adle wrapper
Step 2: Install the latest stable build of the h2o Python module if you don't have it already.

http://www.h2o.ai/download/h2o/python

Step 3: Build project
gradlew build
Step 4: Create AWS Lambda function
4.1 Sign in to the AWS Management Console and open the AWS Lambda console. 4.2 Click “Get Started Now”, or if you have created functions already, click “Create a Lambda function”. 4.3 Click “Skip” on the bottom right. 4.4 Configure Lambda function.

In the Name text field, type “malicious-domain-classifier”. In the Runtime field, select “Java 8”. Click the Upload button and select app-malicious-domains/build/distributions/app-malicious-domains.zip in the file selector. In the Handler field, type “Classify::myHandler”. In the Role field, select “*Basic execution role”. In the new tab click “Allow” on the bottom right. Click “Next” on the bottom right, which opens the Review page. Click “Create function” on the bottom right. If this step fails, click “Previous” then provide the S3 link URL at “Upload a .ZIP from Amazon S3” after uploading app-malicious-domains.zip to S3,. 4.5 Test Lambda function (Optional) Click “Actions” and select “Configure test event” near the top left of the page.

Enter JSON format of the domain name to be classified, for example {“domain”:“plzdonthackmekthxbye”}, and click “Save and test”. Execution results near the bottom of the page should display “succeeded” and give a JSON response. If an error message shows that the task timed out, click “Advanced settings” to increase the Timeout field.
Step 5: Create API endpoint for Lambda function.
5.1 Click the “API endpoints” tab and then “Add API endpoint”. 5.2 Configure API endpoint

Select API Gateway for the API endpoint type field. Select “POST” for the Method field. Type “prod” for the Deployment stage field. Select “Open” for the Security field. Click “Submit”. Write down the API endpoint URL that now appears in the API endpoint tab. It will be needed for step 6.1. 5.3 Enable CORS Open the API Gateway console in the AWS Management Console. Select “LambdaMicroservice”. Select “/malicious-domain-classifier” on the left sidebar. Click “Enable CORS”.

Click “Enable CORS and replace existing CORS headers” on the bottom right. Click “Yes, replace existing values” on the pop-up window. Click “Deploy API” near the top left. Select “prod” in Deployment stage field and click “Deploy”.
Step 6: Deploy the .war file
6.1 Open app-malicious-domains/src/main/webapp/app.js and change line 26 to the API endpoint URL. 6.2 Run the following command:
gradlew jettyRunWar -x generateModel

(If you don't include the -x generateModel above, you will build the models and deployment package again, which is time consuming.)

Step 7: Visit the webapp in a browser.

http://localhost:8080/

Underneath the hood
H2O Model: Logistic Regression with regularization
Features * string length * Shannon entropy * proportion of vowels * count of substrings that are English words
usion Matrix (Act/Pred) for max f1 @ threshold = 0.493541945983: 
   0      1      Error    Rate
-  -----  -----  -------  ---------------
   15889  315    0.0194   (315.0/16204.0)
   346    10043  0.0333   (346.0/10389.0)
l  16235  10358  0.0249   (661.0/26593.0)
Model Prediction via API endpoint Make the following POST request with curl using the API endpoint URL.
rl -X POST -d "{\"domain\":\"plzdonthackmekthxbye\"}" <api_endpoint_url>
JSON response with label and class probabilities.

abel": 1,
lass0Prob": 0.002564083122440164,
lass1Prob": 0.9974359168775598,
ntercept": -14.94132841574946,
ength": 29.841565204329598,
ntropy": 11.178560649883826,
roVowels": -1.7679609134401084,
umWords": -18.347249579636706

Troubleshooting
Error uploading .zip file (Step 4.4)

Check if the function already exists and, if not, try again. For slower internet connections, try uploading the .zip file with a S3 link in the Code tab.

Timeout when testing Lambda function (Step 4.5)

In the AWS Lambda console, click the Configuration tab. Click Advanced settings and increase the timeout field.

Gateway timeout (504 error) or “(Invalid input)” when using webapp (Step 7)

This is due to Lambda's cold start. Keep attempting domain names and after no more than a minute, the webapp should be responsive.

Performance

Performance was tested with JMeter on a MacBook Pro with 2.5 GHz Intel Core i7 on wireless internet connection over the office WAN. Before testing, a warm-up cycle of 100 loops was run. Times are in milliseconds. The body data of the POST request was {“domain”:“plzdonthackmekthxbye”}.

| Memory (MB) | Threads | Loops | Samples | Average | Median | 90% | 95% | 99% | Min | Max | Error % | Throughput (calls/sec) | |————-|———|——-|———|———|——–|—–|—–|——|—–|——-|———|————————| | 512 | 1 | 10000 | 10000 | 113 | 102 | 118 | 138 | 426 | 85 | 2137 | 0 | 8.4 | | 512 | 10 | 1000 | 10000 | 170 | 102 | 148 | 182 | 334 | 85 | 30330 | 0.18 | 44 | | 512 | 100 | 100 | 10000 | 392 | 149 | 643 | 943 | 1738 | 85 | 30307 | 0.43 | 168 |

References
Gradle

The gradle distribution shows how to do basic war and jetty plugin operations.

  1. https://services.gradle.org/distributions/gradle-2.7-all.zip
  2. unzip gradle-2.7-all
  3. cd gradle-2.7/samples/webApplication/customized
AWS Lambda

http://docs.aws.amazon.com/lambda/latest/dg/create-deployment-pkg-zip-java.html

Data Sources

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.