Decision tree in r is not forming with my training data












0















library(caret)
library(rpart.plot)
car_df <- read.csv("TrainingDataSet.csv", sep = ',', header = TRUE)
str(car_df)

set.seed(3033)
intrain <- createDataPartition(y = car_df$Result, p= 0.7, list = FALSE)
training <- car_df[intrain,]
testing <- car_df[-intrain,]
dim(training)
dim(testing)
anyNA(car_df)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)


I get this warning:




Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights,
info = trainInfo, : There were missing values in resampled
performance measures.




I am trying to classify whether a movie is hit or flop using number of positive and negative sentiment. Here my data



  dput(car_df) 

structure(list(MovieName = structure(c(20L, 5L, 31L, 26L, 27L,
12L, 36L, 29L, 38L, 4L, 6L, 8L, 10L, 15L, 18L, 21L, 24L, 34L,
35L, 7L, 37L, 25L, 23L, 2L, 11L, 40L, 33L, 28L, 14L, 3L, 17L,
16L, 32L, 22L, 30L, 1L, 19L, 39L, 9L, 13L), .Label = c("#96Movie",
"#alphamovie", "#APrivateWar", "#AStarIsBorn", "#BlackPanther",
"#BohemianRhapsody", "#CCV", "#Creed2", "#CrimesOfGrindelwald",
"#Deadpool2", "#firstman", "#GameNight", "#GreenBookMovie", "#grinchmovie",
"#Incredibles2", "#indivisiblemovie", "#InstantFamily", "#JurassicWorld",
"#KolamaavuKokila", "#Oceans8", "#Overlord", "#PariyerumPerumal",
"#RalphBreaksTheInternet", "#Rampage", "#Ratchasan", "#ReadyPlayerOne",
"#RedSparrow", "#RobinHoodMovie", "#Sarkar", "#Seemaraja", "#Skyscraper",
"#Suspiria", "#TheLastKey", "#TheNun", "#ThugsOfHindostan", "#TombRaider",
"#VadaChennai", "#Venom", "#Vishwaroopam2", "#WidowsMovie"), class = "factor"),
PositivePercent = c(40.10554, 67.65609, 80.46796, 71.34831,
45.36082, 68.82591, 46.78068, 63.85787, 47.20497, 32.11753,
63.7, 39.2, 82.76553, 88.78613, 72.18274, 72.43187, 31.0089,
38.50932, 38.9, 19.9, 84.26854, 29.4382, 58.13953, 86.9281,
64.54965, 56, 0, 56.61914, 58.82353, 54.98891, 78.21682,
90, 64.3002, 85.8, 51.625, 67.71894, 92.21557, 53.84615,
40.12158, 68.08081), NegativePercent = c(11.34565, 21.28966,
6.408952, 13.10861, 26.80412, 17.10526, 18.61167, 10.55838,
46.48033, 56.231, 9.9, 12.1, 9.018036, 6.473988, 13.90863,
16.77149, 63.20475, 42.54658, 40.9, 5.4, 3.907816, 2.022472,
10.51567, 3.267974, 15.12702, 15.3, 100, 18.12627, 11.76471,
13.41463, 5.775076, 10, 20.08114, 2.1, 5.5, 7.739308, 0,
34.61538, 12.86727, 10.70707), Result = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Flop", "Hit"
), class = "factor")), class = "data.frame", row.names = c(NA,
-40L))









share|improve this question

























  • Can you include a sample of your data using the dput(car_df) command?

    – RAB
    Nov 25 '18 at 23:52











  • i have edited please find it above @user10626943

    – Rakesh
    Nov 25 '18 at 23:55













  • This is a warning, not an error. Does you tree get fitted or not?

    – desertnaut
    Nov 26 '18 at 0:01











  • no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?

    – Rakesh
    Nov 26 '18 at 0:03













  • i converted to factor but still same error msg

    – Rakesh
    Nov 26 '18 at 0:13
















0















library(caret)
library(rpart.plot)
car_df <- read.csv("TrainingDataSet.csv", sep = ',', header = TRUE)
str(car_df)

set.seed(3033)
intrain <- createDataPartition(y = car_df$Result, p= 0.7, list = FALSE)
training <- car_df[intrain,]
testing <- car_df[-intrain,]
dim(training)
dim(testing)
anyNA(car_df)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)


I get this warning:




Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights,
info = trainInfo, : There were missing values in resampled
performance measures.




I am trying to classify whether a movie is hit or flop using number of positive and negative sentiment. Here my data



  dput(car_df) 

structure(list(MovieName = structure(c(20L, 5L, 31L, 26L, 27L,
12L, 36L, 29L, 38L, 4L, 6L, 8L, 10L, 15L, 18L, 21L, 24L, 34L,
35L, 7L, 37L, 25L, 23L, 2L, 11L, 40L, 33L, 28L, 14L, 3L, 17L,
16L, 32L, 22L, 30L, 1L, 19L, 39L, 9L, 13L), .Label = c("#96Movie",
"#alphamovie", "#APrivateWar", "#AStarIsBorn", "#BlackPanther",
"#BohemianRhapsody", "#CCV", "#Creed2", "#CrimesOfGrindelwald",
"#Deadpool2", "#firstman", "#GameNight", "#GreenBookMovie", "#grinchmovie",
"#Incredibles2", "#indivisiblemovie", "#InstantFamily", "#JurassicWorld",
"#KolamaavuKokila", "#Oceans8", "#Overlord", "#PariyerumPerumal",
"#RalphBreaksTheInternet", "#Rampage", "#Ratchasan", "#ReadyPlayerOne",
"#RedSparrow", "#RobinHoodMovie", "#Sarkar", "#Seemaraja", "#Skyscraper",
"#Suspiria", "#TheLastKey", "#TheNun", "#ThugsOfHindostan", "#TombRaider",
"#VadaChennai", "#Venom", "#Vishwaroopam2", "#WidowsMovie"), class = "factor"),
PositivePercent = c(40.10554, 67.65609, 80.46796, 71.34831,
45.36082, 68.82591, 46.78068, 63.85787, 47.20497, 32.11753,
63.7, 39.2, 82.76553, 88.78613, 72.18274, 72.43187, 31.0089,
38.50932, 38.9, 19.9, 84.26854, 29.4382, 58.13953, 86.9281,
64.54965, 56, 0, 56.61914, 58.82353, 54.98891, 78.21682,
90, 64.3002, 85.8, 51.625, 67.71894, 92.21557, 53.84615,
40.12158, 68.08081), NegativePercent = c(11.34565, 21.28966,
6.408952, 13.10861, 26.80412, 17.10526, 18.61167, 10.55838,
46.48033, 56.231, 9.9, 12.1, 9.018036, 6.473988, 13.90863,
16.77149, 63.20475, 42.54658, 40.9, 5.4, 3.907816, 2.022472,
10.51567, 3.267974, 15.12702, 15.3, 100, 18.12627, 11.76471,
13.41463, 5.775076, 10, 20.08114, 2.1, 5.5, 7.739308, 0,
34.61538, 12.86727, 10.70707), Result = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Flop", "Hit"
), class = "factor")), class = "data.frame", row.names = c(NA,
-40L))









share|improve this question

























  • Can you include a sample of your data using the dput(car_df) command?

    – RAB
    Nov 25 '18 at 23:52











  • i have edited please find it above @user10626943

    – Rakesh
    Nov 25 '18 at 23:55













  • This is a warning, not an error. Does you tree get fitted or not?

    – desertnaut
    Nov 26 '18 at 0:01











  • no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?

    – Rakesh
    Nov 26 '18 at 0:03













  • i converted to factor but still same error msg

    – Rakesh
    Nov 26 '18 at 0:13














0












0








0








library(caret)
library(rpart.plot)
car_df <- read.csv("TrainingDataSet.csv", sep = ',', header = TRUE)
str(car_df)

set.seed(3033)
intrain <- createDataPartition(y = car_df$Result, p= 0.7, list = FALSE)
training <- car_df[intrain,]
testing <- car_df[-intrain,]
dim(training)
dim(testing)
anyNA(car_df)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)


I get this warning:




Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights,
info = trainInfo, : There were missing values in resampled
performance measures.




I am trying to classify whether a movie is hit or flop using number of positive and negative sentiment. Here my data



  dput(car_df) 

structure(list(MovieName = structure(c(20L, 5L, 31L, 26L, 27L,
12L, 36L, 29L, 38L, 4L, 6L, 8L, 10L, 15L, 18L, 21L, 24L, 34L,
35L, 7L, 37L, 25L, 23L, 2L, 11L, 40L, 33L, 28L, 14L, 3L, 17L,
16L, 32L, 22L, 30L, 1L, 19L, 39L, 9L, 13L), .Label = c("#96Movie",
"#alphamovie", "#APrivateWar", "#AStarIsBorn", "#BlackPanther",
"#BohemianRhapsody", "#CCV", "#Creed2", "#CrimesOfGrindelwald",
"#Deadpool2", "#firstman", "#GameNight", "#GreenBookMovie", "#grinchmovie",
"#Incredibles2", "#indivisiblemovie", "#InstantFamily", "#JurassicWorld",
"#KolamaavuKokila", "#Oceans8", "#Overlord", "#PariyerumPerumal",
"#RalphBreaksTheInternet", "#Rampage", "#Ratchasan", "#ReadyPlayerOne",
"#RedSparrow", "#RobinHoodMovie", "#Sarkar", "#Seemaraja", "#Skyscraper",
"#Suspiria", "#TheLastKey", "#TheNun", "#ThugsOfHindostan", "#TombRaider",
"#VadaChennai", "#Venom", "#Vishwaroopam2", "#WidowsMovie"), class = "factor"),
PositivePercent = c(40.10554, 67.65609, 80.46796, 71.34831,
45.36082, 68.82591, 46.78068, 63.85787, 47.20497, 32.11753,
63.7, 39.2, 82.76553, 88.78613, 72.18274, 72.43187, 31.0089,
38.50932, 38.9, 19.9, 84.26854, 29.4382, 58.13953, 86.9281,
64.54965, 56, 0, 56.61914, 58.82353, 54.98891, 78.21682,
90, 64.3002, 85.8, 51.625, 67.71894, 92.21557, 53.84615,
40.12158, 68.08081), NegativePercent = c(11.34565, 21.28966,
6.408952, 13.10861, 26.80412, 17.10526, 18.61167, 10.55838,
46.48033, 56.231, 9.9, 12.1, 9.018036, 6.473988, 13.90863,
16.77149, 63.20475, 42.54658, 40.9, 5.4, 3.907816, 2.022472,
10.51567, 3.267974, 15.12702, 15.3, 100, 18.12627, 11.76471,
13.41463, 5.775076, 10, 20.08114, 2.1, 5.5, 7.739308, 0,
34.61538, 12.86727, 10.70707), Result = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Flop", "Hit"
), class = "factor")), class = "data.frame", row.names = c(NA,
-40L))









share|improve this question
















library(caret)
library(rpart.plot)
car_df <- read.csv("TrainingDataSet.csv", sep = ',', header = TRUE)
str(car_df)

set.seed(3033)
intrain <- createDataPartition(y = car_df$Result, p= 0.7, list = FALSE)
training <- car_df[intrain,]
testing <- car_df[-intrain,]
dim(training)
dim(testing)
anyNA(car_df)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)


I get this warning:




Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights,
info = trainInfo, : There were missing values in resampled
performance measures.




I am trying to classify whether a movie is hit or flop using number of positive and negative sentiment. Here my data



  dput(car_df) 

structure(list(MovieName = structure(c(20L, 5L, 31L, 26L, 27L,
12L, 36L, 29L, 38L, 4L, 6L, 8L, 10L, 15L, 18L, 21L, 24L, 34L,
35L, 7L, 37L, 25L, 23L, 2L, 11L, 40L, 33L, 28L, 14L, 3L, 17L,
16L, 32L, 22L, 30L, 1L, 19L, 39L, 9L, 13L), .Label = c("#96Movie",
"#alphamovie", "#APrivateWar", "#AStarIsBorn", "#BlackPanther",
"#BohemianRhapsody", "#CCV", "#Creed2", "#CrimesOfGrindelwald",
"#Deadpool2", "#firstman", "#GameNight", "#GreenBookMovie", "#grinchmovie",
"#Incredibles2", "#indivisiblemovie", "#InstantFamily", "#JurassicWorld",
"#KolamaavuKokila", "#Oceans8", "#Overlord", "#PariyerumPerumal",
"#RalphBreaksTheInternet", "#Rampage", "#Ratchasan", "#ReadyPlayerOne",
"#RedSparrow", "#RobinHoodMovie", "#Sarkar", "#Seemaraja", "#Skyscraper",
"#Suspiria", "#TheLastKey", "#TheNun", "#ThugsOfHindostan", "#TombRaider",
"#VadaChennai", "#Venom", "#Vishwaroopam2", "#WidowsMovie"), class = "factor"),
PositivePercent = c(40.10554, 67.65609, 80.46796, 71.34831,
45.36082, 68.82591, 46.78068, 63.85787, 47.20497, 32.11753,
63.7, 39.2, 82.76553, 88.78613, 72.18274, 72.43187, 31.0089,
38.50932, 38.9, 19.9, 84.26854, 29.4382, 58.13953, 86.9281,
64.54965, 56, 0, 56.61914, 58.82353, 54.98891, 78.21682,
90, 64.3002, 85.8, 51.625, 67.71894, 92.21557, 53.84615,
40.12158, 68.08081), NegativePercent = c(11.34565, 21.28966,
6.408952, 13.10861, 26.80412, 17.10526, 18.61167, 10.55838,
46.48033, 56.231, 9.9, 12.1, 9.018036, 6.473988, 13.90863,
16.77149, 63.20475, 42.54658, 40.9, 5.4, 3.907816, 2.022472,
10.51567, 3.267974, 15.12702, 15.3, 100, 18.12627, 11.76471,
13.41463, 5.775076, 10, 20.08114, 2.1, 5.5, 7.739308, 0,
34.61538, 12.86727, 10.70707), Result = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Flop", "Hit"
), class = "factor")), class = "data.frame", row.names = c(NA,
-40L))






r machine-learning decision-tree






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 0:33









mischva11

890818




890818










asked Nov 25 '18 at 23:48









RakeshRakesh

3417




3417













  • Can you include a sample of your data using the dput(car_df) command?

    – RAB
    Nov 25 '18 at 23:52











  • i have edited please find it above @user10626943

    – Rakesh
    Nov 25 '18 at 23:55













  • This is a warning, not an error. Does you tree get fitted or not?

    – desertnaut
    Nov 26 '18 at 0:01











  • no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?

    – Rakesh
    Nov 26 '18 at 0:03













  • i converted to factor but still same error msg

    – Rakesh
    Nov 26 '18 at 0:13



















  • Can you include a sample of your data using the dput(car_df) command?

    – RAB
    Nov 25 '18 at 23:52











  • i have edited please find it above @user10626943

    – Rakesh
    Nov 25 '18 at 23:55













  • This is a warning, not an error. Does you tree get fitted or not?

    – desertnaut
    Nov 26 '18 at 0:01











  • no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?

    – Rakesh
    Nov 26 '18 at 0:03













  • i converted to factor but still same error msg

    – Rakesh
    Nov 26 '18 at 0:13

















Can you include a sample of your data using the dput(car_df) command?

– RAB
Nov 25 '18 at 23:52





Can you include a sample of your data using the dput(car_df) command?

– RAB
Nov 25 '18 at 23:52













i have edited please find it above @user10626943

– Rakesh
Nov 25 '18 at 23:55







i have edited please find it above @user10626943

– Rakesh
Nov 25 '18 at 23:55















This is a warning, not an error. Does you tree get fitted or not?

– desertnaut
Nov 26 '18 at 0:01





This is a warning, not an error. Does you tree get fitted or not?

– desertnaut
Nov 26 '18 at 0:01













no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?

– Rakesh
Nov 26 '18 at 0:03







no it does not filter @desertnaut but all my features are not "Factor". Out of my 4 columns two are factors and two are num. will that be a reason?

– Rakesh
Nov 26 '18 at 0:03















i converted to factor but still same error msg

– Rakesh
Nov 26 '18 at 0:13





i converted to factor but still same error msg

– Rakesh
Nov 26 '18 at 0:13












1 Answer
1






active

oldest

votes


















1














> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...

> with(car_df, table( Result))
Result
Flop Hit
5 35

> dtree_fit
CART

29 samples
3 predictor
2 classes: 'Flop', 'Hit'


So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"



BTW: should be (but unsurprisingly doesn't clear the warning):



(split = "information")


If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:



> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs





share|improve this answer


























  • Theres no need to be snarky, there are much nicer ways to say what you did.

    – RAB
    Nov 26 '18 at 0:42











  • so what can I do to overcome this issue? Please help.

    – Rakesh
    Nov 26 '18 at 0:44











  • Get more data, or increase the amount of flop data that gets passed to the CV steps.

    – 42-
    Nov 26 '18 at 0:46











  • Is that the reason for the above warning? Lack of flop data? @42

    – Rakesh
    Nov 26 '18 at 0:54











  • i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0

    – Rakesh
    Nov 26 '18 at 1:04













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473161%2fdecision-tree-in-r-is-not-forming-with-my-training-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...

> with(car_df, table( Result))
Result
Flop Hit
5 35

> dtree_fit
CART

29 samples
3 predictor
2 classes: 'Flop', 'Hit'


So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"



BTW: should be (but unsurprisingly doesn't clear the warning):



(split = "information")


If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:



> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs





share|improve this answer


























  • Theres no need to be snarky, there are much nicer ways to say what you did.

    – RAB
    Nov 26 '18 at 0:42











  • so what can I do to overcome this issue? Please help.

    – Rakesh
    Nov 26 '18 at 0:44











  • Get more data, or increase the amount of flop data that gets passed to the CV steps.

    – 42-
    Nov 26 '18 at 0:46











  • Is that the reason for the above warning? Lack of flop data? @42

    – Rakesh
    Nov 26 '18 at 0:54











  • i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0

    – Rakesh
    Nov 26 '18 at 1:04


















1














> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...

> with(car_df, table( Result))
Result
Flop Hit
5 35

> dtree_fit
CART

29 samples
3 predictor
2 classes: 'Flop', 'Hit'


So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"



BTW: should be (but unsurprisingly doesn't clear the warning):



(split = "information")


If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:



> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs





share|improve this answer


























  • Theres no need to be snarky, there are much nicer ways to say what you did.

    – RAB
    Nov 26 '18 at 0:42











  • so what can I do to overcome this issue? Please help.

    – Rakesh
    Nov 26 '18 at 0:44











  • Get more data, or increase the amount of flop data that gets passed to the CV steps.

    – 42-
    Nov 26 '18 at 0:46











  • Is that the reason for the above warning? Lack of flop data? @42

    – Rakesh
    Nov 26 '18 at 0:54











  • i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0

    – Rakesh
    Nov 26 '18 at 1:04
















1












1








1







> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...

> with(car_df, table( Result))
Result
Flop Hit
5 35

> dtree_fit
CART

29 samples
3 predictor
2 classes: 'Flop', 'Hit'


So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"



BTW: should be (but unsurprisingly doesn't clear the warning):



(split = "information")


If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:



> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs





share|improve this answer















> str(car_df)
'data.frame': 40 obs. of 4 variables:
$ MovieName : Factor w/ 40 levels "#96Movie","#alphamovie",..: 20 5 31 26 27 12 36 29 38 4 ...
$ PositivePercent: num 40.1 67.7 80.5 71.3 45.4 ...
$ NegativePercent: num 11.35 21.29 6.41 13.11 26.8 ...
$ Result : Factor w/ 2 levels "Flop","Hit": 2 2 2 2 2 2 2 2 2 1 ...

> with(car_df, table( Result))
Result
Flop Hit
5 35

> dtree_fit
CART

29 samples
3 predictor
2 classes: 'Flop', 'Hit'


So you have an outcome with 5 flops, and one of the predictors is a variable with 40 different values. This does not seem surprising given that each of your cases is unique and you have severely unbalanced outcome. The presence of data does not guarantee the possibility of substantial conclusions. If there's any error here, it's the lack of code in the fitter that would say something along the lines of "Really? You think statistical packages should be able to solve a severe lack of data?"



BTW: should be (but unsurprisingly doesn't clear the warning):



(split = "information")


If you change the number of cross-validation bins to a number that would allow flops to be distributed among the various bins then you can get a non-warning result. Whether it will have much validity remains questionable, given the small sample size:



> trctrl <- trainControl(method = "repeatedcv", number = 3, repeats = 3)
set.seed(3333)
dtree_fit <- train(Result ~., data = training, method = "rpart",
parms = list(split = "infromation"),
trControl=trctrl,
tuneLength = 10)
# no warning on one of my runs






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 26 '18 at 1:11

























answered Nov 26 '18 at 0:35









42-42-

215k15263401




215k15263401













  • Theres no need to be snarky, there are much nicer ways to say what you did.

    – RAB
    Nov 26 '18 at 0:42











  • so what can I do to overcome this issue? Please help.

    – Rakesh
    Nov 26 '18 at 0:44











  • Get more data, or increase the amount of flop data that gets passed to the CV steps.

    – 42-
    Nov 26 '18 at 0:46











  • Is that the reason for the above warning? Lack of flop data? @42

    – Rakesh
    Nov 26 '18 at 0:54











  • i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0

    – Rakesh
    Nov 26 '18 at 1:04





















  • Theres no need to be snarky, there are much nicer ways to say what you did.

    – RAB
    Nov 26 '18 at 0:42











  • so what can I do to overcome this issue? Please help.

    – Rakesh
    Nov 26 '18 at 0:44











  • Get more data, or increase the amount of flop data that gets passed to the CV steps.

    – 42-
    Nov 26 '18 at 0:46











  • Is that the reason for the above warning? Lack of flop data? @42

    – Rakesh
    Nov 26 '18 at 0:54











  • i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0

    – Rakesh
    Nov 26 '18 at 1:04



















Theres no need to be snarky, there are much nicer ways to say what you did.

– RAB
Nov 26 '18 at 0:42





Theres no need to be snarky, there are much nicer ways to say what you did.

– RAB
Nov 26 '18 at 0:42













so what can I do to overcome this issue? Please help.

– Rakesh
Nov 26 '18 at 0:44





so what can I do to overcome this issue? Please help.

– Rakesh
Nov 26 '18 at 0:44













Get more data, or increase the amount of flop data that gets passed to the CV steps.

– 42-
Nov 26 '18 at 0:46





Get more data, or increase the amount of flop data that gets passed to the CV steps.

– 42-
Nov 26 '18 at 0:46













Is that the reason for the above warning? Lack of flop data? @42

– Rakesh
Nov 26 '18 at 0:54





Is that the reason for the above warning? Lack of flop data? @42

– Rakesh
Nov 26 '18 at 0:54













i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0

– Rakesh
Nov 26 '18 at 1:04







i ran your code @42- as you said warning did not pop up. But Accuracy Kappa 0.862963 0 Tuning parameter 'cp' was held constant at a value of 0

– Rakesh
Nov 26 '18 at 1:04






















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473161%2fdecision-tree-in-r-is-not-forming-with-my-training-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Create new schema in PostgreSQL using DBeaver

Deepest pit of an array with Javascript: test on Codility

Fotorealismo