R Programming

Tidyverse

O mundo Tidyverse Dentro do processo diário de análise de dados, você provalmente usará um dos pacotes abaixo (ou todos eles!): ggplot2 dplyr tidyr readr purrr tibble stringr forcats Mas saiba que esses não são os únicos packages dentro do mundo Tidyverse. Existem muitos outros pacotes que são instalados, principalmente aqueles relacionados a leitura de dados, datas e muitos mais (ex.: DBI, httr, googledrive, lubridate e etc). Você certamente encontrará muitos packages para toda a jornada de dados listada abaixo:

Continue reading

Supervised Learning in R: Classification

Chapter 1 - k-Nearest Neighbors (kNN) 1.1 - Recognizing a road sign with kNN After several trips with a human behind the wheel, it is time for the self-driving car to attempt the test course alone. As it begins to drive away, its camera captures the following image: Figure 1: A caption Can you apply a kNN classifier to help the car recognize this sign? The dataset signs must be loaded in your workspace along with the dataframe next_sign, which holds the observation you want to classify.

Continue reading

Credit Card Fraud Detection

Objective Our goal is to train a Neural Network to detect fraudulent credit card transactions in a dataset referring to two days transactions by european cardholders. Source: https://www.kaggle.com/mlg-ulb/creditcardfraud/data Data credit = read.csv(path) The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days. As we can see, this dataset consists of thirty explanatory variables, and a response variable which represents whether a transation was a fraud or not.

Continue reading

German Credit and Regression Tree

Objetive Train a model and use to make predictions for German Credit dataset Data german = read.csv(path) str(german) ## 'data.frame': 1000 obs. of 21 variables: ## $ default : int 0 1 0 0 1 0 0 0 0 1 ... ## $ account_check_status : Factor w/ 4 levels "< 0 DM",">= 200 DM / salary assignments for at least 1 year",..: 1 3 4 1 1 4 4 3 4 3 .

Continue reading

Correlation and Regression

path <- "C:/Users/andre/OneDrive/Área de Trabalho/salerno/blogdown/datasets/ncbirths" path <- paste0(path, "/ncbirths.csv") data <- read.csv(path, stringsAsFactors = FALSE) dim(data) ## [1] 1450 15 names(data) ## [1] "ID" "Plural" "Sex" "MomAge" ## [5] "Weeks" "Marital" "RaceMom" "HispMom" ## [9] "Gained" "Smoke" "BirthWeightOz" "BirthWeightGm" ## [13] "Low" "Premie" "MomRace" library(ggplot2) ggplot(data = data, aes(y = BirthWeightOz, x = Weeks)) + geom_point() ## Warning: Removed 1 rows containing missing values (geom_point). # Boxplot of weight vs.

Continue reading

Classifying using Logistic Regression

1 - Objective The objective of this example is to identify each of a number of benign or malignant classes. 2 - Data Let’s getting the data. BCData <- read.table(url("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"), sep = ",") # setting column names names(BCData)<- c('Id', 'ClumpThickness', 'CellSize','CellShape', 'MarginalAdhesion','SECellSize', 'BareNuclei', 'BlandChromatin','NormalNucleoli', 'Mitoses','Class') 3 - EDA - Exploratory Data Analysis It’s important to extract prelimionary knowledge from the dataset. dim(BCData) ## [1] 699 11 str(BCData) ## 'data.frame': 699 obs.

Continue reading

Diagnosing breast cancer with the kNN algorithm

1 - Introduction Could the Machine Learning Algorithms detect beforehand any abnormal cell process? We know that this clinical battle is not so easy and there are a lot of people envolved in this process trying to identify a clear path to the cure. In complement to the decision human process, coult the technology decrease the subjective bias inherently in the process and improve our decisions? We absolutely know that the human being process is limited when compared to high capacity of the computers.

Continue reading