Blogs

Functions

1. Defining functions def square(number): print("The square of", number, "is", number ** 2) square(7) ## The square of 7 is 49 2. Functions with multiple parameters def maximum(value1, value2, value3): max_value = value1 if value2 > max_value: max_value = value2 if value3 > max_value: max_value = value3 return max_value maximum(12, 27, 36) ## 36 maximum('yellow', 'red', 'orange') ## 'yellow' 3. Random-Number Generation import random random.seed(10) for roll in range(10): print(random.

Continue reading

German Credit and Regression Tree

Objetive Train a model and use to make predictions for German Credit dataset Data german = read.csv(path) str(german) ## 'data.frame': 1000 obs. of 21 variables: ## $ default : int 0 1 0 0 1 0 0 0 0 1 ... ## $ account_check_status : Factor w/ 4 levels "< 0 DM",">= 200 DM / salary assignments for at least 1 year",..: 1 3 4 1 1 4 4 3 4 3 .

Continue reading

Correlation and Regression

path <- "C:/Users/andre/OneDrive/Área de Trabalho/salerno/blogdown/datasets/ncbirths" path <- paste0(path, "/ncbirths.csv") data <- read.csv(path, stringsAsFactors = FALSE) dim(data) ## [1] 1450 15 names(data) ## [1] "ID" "Plural" "Sex" "MomAge" ## [5] "Weeks" "Marital" "RaceMom" "HispMom" ## [9] "Gained" "Smoke" "BirthWeightOz" "BirthWeightGm" ## [13] "Low" "Premie" "MomRace" library(ggplot2) ggplot(data = data, aes(y = BirthWeightOz, x = Weeks)) + geom_point() ## Warning: Removed 1 rows containing missing values (geom_point). # Boxplot of weight vs.

Continue reading

Classifying using Logistic Regression

1 - Objective The objective of this example is to identify each of a number of benign or malignant classes. 2 - Data Let’s getting the data. BCData <- read.table(url("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"), sep = ",") # setting column names names(BCData)<- c('Id', 'ClumpThickness', 'CellSize','CellShape', 'MarginalAdhesion','SECellSize', 'BareNuclei', 'BlandChromatin','NormalNucleoli', 'Mitoses','Class') 3 - EDA - Exploratory Data Analysis It’s important to extract prelimionary knowledge from the dataset. dim(BCData) ## [1] 699 11 str(BCData) ## 'data.frame': 699 obs.

Continue reading

Diagnosing breast cancer with the kNN algorithm

1 - Introduction Could the Machine Learning Algorithms detect beforehand any abnormal cell process? We know that this clinical battle is not so easy and there are a lot of people envolved in this process trying to identify a clear path to the cure. In complement to the decision human process, coult the technology decrease the subjective bias inherently in the process and improve our decisions? We absolutely know that the human being process is limited when compared to high capacity of the computers.

Continue reading

Binary Search Algorithm

def binary_search(lista, item): low = 0 # low and high are part of the list thar you are searching for high = len(lista) - 1 while low <= high: #while you are not achieving one unique element middle = (low + high) // 2 # checking the central element guess = lista[middle] if guess == item: return middle if guess > item: # the guess are too high high = middle - 1 else: # the guess are too low low = middle + 1 return None my_list = [1, 3, 5, 7, 9] print(binary_search(my_list, 3)) ## 1 print(binary_search(my_list, -1)) ## None

Continue reading

Quicksort Algorithm

def quicksort(array): if len(array) < 2: return array else: pivo = array[0] # caso recursivo menores = [i for i in array [1:] if i <= pivo] # subarray de todos os elementos menores do que o pivo maiores = [i for i in array[1:] if i > pivo] # subarray de todos os elementos maiores do que o pivo return quicksort(menores) + [pivo] + quicksort(maiores) print(quicksort([10, 5, 2, 3])) ## [2, 3, 5, 10]

Continue reading

R Packages for Regression

R Packages for Regression For this post we will present some valuable R packages for using in regression studies. Check it out! stats Package very useful for statistical calculations and random number generations. Below you can find the most useful function in regression area: lm(): it is used to fit linear models summary.lm(): thsi function returns a summary for linear model fits coef(): it is possible obtain the coefficients from modeling functions

Continue reading

Random Forest

Random Forest In this post we will explore some ideas around the Random Forest model Objective We are working on in the dataset called Boston Housing and the main idea here is regression task and we are concerned with modeling the price of houses in thousands of dollars in the Surburb of Boston. So, we are dirting our hands in a regression predictive modeling problem. The main goal here is to fit a regression model that best explains the variation in medv variable.

Continue reading

Data Frame

Data Frame This format is usually used when the information is not contained in just one dimension (vector) Example product <- c("Product A", "Product B", "Product C", "Product D", "Product E") price <- c(5, 15, 4, 6, 8) table_price_product <- data.frame(product, price) table_price_product ## product price ## 1 Product A 5 ## 2 Product B 15 ## 3 Product C 4 ## 4 Product D 6 ## 5 Product E 8 Indexing Access the D Product in the Products Table:

Continue reading