--- title: "Using rdomains" author: "Gaurav Sood" date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{Illustrating use of rdomains} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ### rdomains: Get the category of content hosted by a domain #### Install and Load the package The latest development version of the package will always be on GitHub. To install the package from GitHub and to load the installed package: ```{r, eval=FALSE, install} #library(devtools) install_github("themains/rdomains") ``` To install the package from CRAN, type in: ```{r, eval=FALSE, cran_install} install.packages("rdomains") ``` Next, load the package: ```{r, eval=FALSE, load_pkg} library(rdomains) ``` #### Shalla To get category of the content from [shallalist](http://www.shallalist.de), first download the latest file using: ```{r, eval=FALSE, down_shalla} get_shalla_data() ``` And then, get the category using: ```{r, eval=FALSE, shalla} shalla_cat("http://www.google.com") ``` ``` ## domain_name shalla_category ## 1 google.com searchengines ``` #### DMOZ To get category of the content from DMOZ, first download the archived parsed CSV file using: ```{r, eval=FALSE, down_dmoz} get_dmoz_data() ``` And then, get the category using: ```{r, eval=FALSE, dmoz} dmoz_cat("http://www.google.com") ``` #### ML Probability that Domain Hosts Adult Content Based on features of Domain Name and Suffix alone: ```{r, eval=FALSE, ml} adult_ml1_cat("http://www.google.com") ``` ``` ## domain_name category ## 1 google.com 0.3133728 ``` #### Virustotal Start by getting the API key from [virustotal](https://www.virustotal.com/). Get virustotal category by running: ```{r, eval=FALSE, virustotal} virustotal_cat("http://www.google.com") ``` ``` ## domain bitdefender dr_web alexa google websense trendmicro ## 1 http://www.google.com searchengines chats google searchengines advertisements search engines portals ``` #### Trusted (McAfee) Get the content category of a domain according to McAfee (Trusted): ```{r, eval=FALSE, trusted} trusted_cat("http://www.google.com") ``` ``` ## url status categorization reputation ## 2 http://www.google.com Categorized URL - Search Engines Minimal Risk ``` #### Alexa Category To get the category of content from Amazon (Alexa) (which provides it via DMOZ), start by getting credentials from [https://aws.amazon.com/](https://aws.amazon.com/). Next, set the environment variables: ```{r, eval=FALSE, set_alexa_cred} Sys.setenv("AWS_ACCESS_KEY_ID", "key_id") Sys.getenv("AWS_SECRET_ACCESS_KEY", "secret_key") ``` Then run, ```{r, eval=FALSE, alexa} alexa_cat(domain="http://www.google.com")[1,] ``` ``` ## Title AbsolutePath ## 1 Search Engines/Google Top/Computers/Internet/Searching/Search_Engines/Google ```