Vision 360

Titanic: A case study for predictive analysis on R (Part 4)

Working with titanic data set picked from Kaggle.com's competition, we predicted the passenger survivals with 79.426% accuracy in our previous attempt . This time, we will try to learn the missing values instead of setting trying mean or median. Let's start with Age. Looking at the available data, we can hypothetically correlate Age with attributes like Title, Sex, Fare and HasCabin. Also note that we previous created variable AgePredicted ; we will use it here to identify which records were filled previously. > age_train <- dataset[dataset$AgePredicted == 0, c("Age","Title","Sex","Fare","HasCabin")] > age_test <- dataset[dataset$AgePredicted == 1, c("Title","Sex","Fare","HasCabin")] > formula <- Age ~ Title + Sex + Fare + HasCabin > rp_fit <- rpart(formula, data=age_train, method="class") > PredAge <- predict(rp_fit, newdata=age_tes

Vision 360

Search This Blog

Posts

Titanic: A case study for predictive analysis on R (Part 4)