Cask CDAP services started, but not running during installation












0















After going through the docs for installing CDAP on MapR system (v6.0) and starting the cdap services, am finding that some CDAP services not running after startup (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#starting-cdap-services) despite the services' startup loop not showing any errors. The output after starting the services and checking their status is shown below:



[root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:01 HST 2018 Starting CDAP Auth Server service on mapr007.org.local


/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:04 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local


/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:07 HST 2018 Starting CDAP Master service on mapr007.org.local


Warning: Unable to determine $DRILL_HOME
Wed Nov 21 16:03:48 HST 2018 Ensuring required HBase coprocessors are on HDFS
Wed Nov 21 16:04:00 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:04:15 HST 2018 Starting CDAP Router service on mapr007.org.local


/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:04:17 HST 2018 Starting CDAP UI service on mapr007.org.local



[root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i status ; done
/usr/bin/id: cannot find name for group ID 504
PID file /var/cdap/run/auth-server-cdap.pid exists, but process 12126 does not appear to be running
/usr/bin/id: cannot find name for group ID 504
CDAP Kafka Server running as PID 12653
/usr/bin/id: cannot find name for group ID 504
PID file /var/cdap/run/master-cdap.pid exists, but process 15789 does not appear to be running
/usr/bin/id: cannot find name for group ID 504
CDAP Router running as PID 16184
/usr/bin/id: cannot find name for group ID 504
CDAP UI running as PID 16308


Note that there while there is an "Unable to determine $DRILL_HOME" error, I don't think that this should be a big problem since have added and set the explore.enabled value in the cdap-site.xml to be false.
Looking at the cdap-site.xml, the web UI port does appear to be set to the default 11011 and yet can't see it (if only to check if the UI would tell me more about any errors) despite the fact that it reports as running.



Checking some info about the PIDs, seeing



# looking at the process that report to not be running
[root@mapr007 conf.dist]# ps -p 12126
PID TTY TIME CMD
[root@mapr007 conf.dist]# ps -p 15789
PID TTY TIME CMD

# looking at the rest of the processes
[root@mapr007 conf.dist]# ps -p 12653
PID TTY TIME CMD
12653 ? 00:08:12 java
[root@mapr007 conf.dist]# ps -p 16184
PID TTY TIME CMD
16184 ? 00:03:02 java
[root@mapr007 conf.dist]# ps -p 16308
PID TTY TIME CMD
16308 ? 00:00:01 node


Also checked if the default security.auth.server.bind.port was being used by some other service



root@mapr007 conf.dist]# netstat -anp | grep 10009


but nothing detected.



Not sure where to start debugging from here, so any suggestions or information would be appreciated.





UPDATE



Restarting the services to try to get more logging data, now seeing some errors (better than just it just not complaining and then not working, I guess)



[root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i stop ; done
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Auth Server ...
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Kafka Server ....

/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:30 HST 2018 Stopping CDAP Master ...
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:31 HST 2018 Stopping CDAP Router ....

/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:32 HST 2018 Stopping CDAP UI ....

[root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:41 HST 2018 Starting CDAP Auth Server service on mapr007.org.local

/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:44 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local

/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:47 HST 2018 Starting CDAP Master service on mapr007.org.local

Warning: Unable to determine $DRILL_HOME
Mon Nov 26 11:07:17 HST 2018 Ensuring required HBase coprocessors are on HDFS
Mon Nov 26 11:08:57 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
[ERROR] Master startup checks failed. Please check /var/log/cdap/master-cdap-mapr007.org.local.log to address issues.
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:10:08 HST 2018 Starting CDAP Router service on mapr007.org.local

/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:10:11 HST 2018 Starting CDAP UI service on mapr007.org.local


Checking the content of the /var/log/cdap/master-cdap-mapr007.org.local.log file, at the bottom can see



...
...
...
2018-11-26 11:10:06,996 - ERROR [main:c.c.c.m.s.MasterStartupTool@109] - YarnCheck failed with RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
java.lang.RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:79) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
at co.cask.cdap.common.startup.CheckRunner.runChecks(CheckRunner.java:51) ~[co.cask.cdap.cdap-common-5.1.0.jar:na]
at co.cask.cdap.master.startup.MasterStartupTool.canStartMaster(MasterStartupTool.java:106) [co.cask.cdap.cdap-master-5.1.0.jar:na]
at co.cask.cdap.master.startup.MasterStartupTool.main(MasterStartupTool.java:96) [co.cask.cdap.cdap-master-5.1.0.jar:na]
Caused by: java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[na:1.8.0_181]
at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:76) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
... 3 common frames omitted
2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@113] - Root cause: TimeoutException:
2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@116] - Errors detected while starting up master. Please check the logs, address all errors, then try again.


Following the "CDAP services on Distributed CDAP aren't starting up due to an exception. What should I do?" FAQ in the docs did not seem to help (https://docs.cask.co/cdap/current/en/faqs/cdap.html#cdap-services-on-distributed-cdap-aren-t-starting-up-due-to-an-exception-what-should-i-do).



Will continue debugging, but would appreciate any opinion on these new errors.










share|improve this question





























    0















    After going through the docs for installing CDAP on MapR system (v6.0) and starting the cdap services, am finding that some CDAP services not running after startup (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#starting-cdap-services) despite the services' startup loop not showing any errors. The output after starting the services and checking their status is shown below:



    [root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
    /usr/bin/id: cannot find name for group ID 504
    Wed Nov 21 16:03:01 HST 2018 Starting CDAP Auth Server service on mapr007.org.local


    /usr/bin/id: cannot find name for group ID 504
    Wed Nov 21 16:03:04 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local


    /usr/bin/id: cannot find name for group ID 504
    Wed Nov 21 16:03:07 HST 2018 Starting CDAP Master service on mapr007.org.local


    Warning: Unable to determine $DRILL_HOME
    Wed Nov 21 16:03:48 HST 2018 Ensuring required HBase coprocessors are on HDFS
    Wed Nov 21 16:04:00 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
    /usr/bin/id: cannot find name for group ID 504
    Wed Nov 21 16:04:15 HST 2018 Starting CDAP Router service on mapr007.org.local


    /usr/bin/id: cannot find name for group ID 504
    Wed Nov 21 16:04:17 HST 2018 Starting CDAP UI service on mapr007.org.local



    [root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i status ; done
    /usr/bin/id: cannot find name for group ID 504
    PID file /var/cdap/run/auth-server-cdap.pid exists, but process 12126 does not appear to be running
    /usr/bin/id: cannot find name for group ID 504
    CDAP Kafka Server running as PID 12653
    /usr/bin/id: cannot find name for group ID 504
    PID file /var/cdap/run/master-cdap.pid exists, but process 15789 does not appear to be running
    /usr/bin/id: cannot find name for group ID 504
    CDAP Router running as PID 16184
    /usr/bin/id: cannot find name for group ID 504
    CDAP UI running as PID 16308


    Note that there while there is an "Unable to determine $DRILL_HOME" error, I don't think that this should be a big problem since have added and set the explore.enabled value in the cdap-site.xml to be false.
    Looking at the cdap-site.xml, the web UI port does appear to be set to the default 11011 and yet can't see it (if only to check if the UI would tell me more about any errors) despite the fact that it reports as running.



    Checking some info about the PIDs, seeing



    # looking at the process that report to not be running
    [root@mapr007 conf.dist]# ps -p 12126
    PID TTY TIME CMD
    [root@mapr007 conf.dist]# ps -p 15789
    PID TTY TIME CMD

    # looking at the rest of the processes
    [root@mapr007 conf.dist]# ps -p 12653
    PID TTY TIME CMD
    12653 ? 00:08:12 java
    [root@mapr007 conf.dist]# ps -p 16184
    PID TTY TIME CMD
    16184 ? 00:03:02 java
    [root@mapr007 conf.dist]# ps -p 16308
    PID TTY TIME CMD
    16308 ? 00:00:01 node


    Also checked if the default security.auth.server.bind.port was being used by some other service



    root@mapr007 conf.dist]# netstat -anp | grep 10009


    but nothing detected.



    Not sure where to start debugging from here, so any suggestions or information would be appreciated.





    UPDATE



    Restarting the services to try to get more logging data, now seeing some errors (better than just it just not complaining and then not working, I guess)



    [root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i stop ; done
    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Auth Server ...
    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Kafka Server ....

    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:06:30 HST 2018 Stopping CDAP Master ...
    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:06:31 HST 2018 Stopping CDAP Router ....

    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:06:32 HST 2018 Stopping CDAP UI ....

    [root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:06:41 HST 2018 Starting CDAP Auth Server service on mapr007.org.local

    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:06:44 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local

    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:06:47 HST 2018 Starting CDAP Master service on mapr007.org.local

    Warning: Unable to determine $DRILL_HOME
    Mon Nov 26 11:07:17 HST 2018 Ensuring required HBase coprocessors are on HDFS
    Mon Nov 26 11:08:57 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
    [ERROR] Master startup checks failed. Please check /var/log/cdap/master-cdap-mapr007.org.local.log to address issues.
    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:10:08 HST 2018 Starting CDAP Router service on mapr007.org.local

    /usr/bin/id: cannot find name for group ID 504
    Mon Nov 26 11:10:11 HST 2018 Starting CDAP UI service on mapr007.org.local


    Checking the content of the /var/log/cdap/master-cdap-mapr007.org.local.log file, at the bottom can see



    ...
    ...
    ...
    2018-11-26 11:10:06,996 - ERROR [main:c.c.c.m.s.MasterStartupTool@109] - YarnCheck failed with RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
    java.lang.RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
    at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:79) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
    at co.cask.cdap.common.startup.CheckRunner.runChecks(CheckRunner.java:51) ~[co.cask.cdap.cdap-common-5.1.0.jar:na]
    at co.cask.cdap.master.startup.MasterStartupTool.canStartMaster(MasterStartupTool.java:106) [co.cask.cdap.cdap-master-5.1.0.jar:na]
    at co.cask.cdap.master.startup.MasterStartupTool.main(MasterStartupTool.java:96) [co.cask.cdap.cdap-master-5.1.0.jar:na]
    Caused by: java.util.concurrent.TimeoutException: null
    at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[na:1.8.0_181]
    at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:76) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
    ... 3 common frames omitted
    2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@113] - Root cause: TimeoutException:
    2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@116] - Errors detected while starting up master. Please check the logs, address all errors, then try again.


    Following the "CDAP services on Distributed CDAP aren't starting up due to an exception. What should I do?" FAQ in the docs did not seem to help (https://docs.cask.co/cdap/current/en/faqs/cdap.html#cdap-services-on-distributed-cdap-aren-t-starting-up-due-to-an-exception-what-should-i-do).



    Will continue debugging, but would appreciate any opinion on these new errors.










    share|improve this question



























      0












      0








      0








      After going through the docs for installing CDAP on MapR system (v6.0) and starting the cdap services, am finding that some CDAP services not running after startup (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#starting-cdap-services) despite the services' startup loop not showing any errors. The output after starting the services and checking their status is shown below:



      [root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:03:01 HST 2018 Starting CDAP Auth Server service on mapr007.org.local


      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:03:04 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local


      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:03:07 HST 2018 Starting CDAP Master service on mapr007.org.local


      Warning: Unable to determine $DRILL_HOME
      Wed Nov 21 16:03:48 HST 2018 Ensuring required HBase coprocessors are on HDFS
      Wed Nov 21 16:04:00 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:04:15 HST 2018 Starting CDAP Router service on mapr007.org.local


      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:04:17 HST 2018 Starting CDAP UI service on mapr007.org.local



      [root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i status ; done
      /usr/bin/id: cannot find name for group ID 504
      PID file /var/cdap/run/auth-server-cdap.pid exists, but process 12126 does not appear to be running
      /usr/bin/id: cannot find name for group ID 504
      CDAP Kafka Server running as PID 12653
      /usr/bin/id: cannot find name for group ID 504
      PID file /var/cdap/run/master-cdap.pid exists, but process 15789 does not appear to be running
      /usr/bin/id: cannot find name for group ID 504
      CDAP Router running as PID 16184
      /usr/bin/id: cannot find name for group ID 504
      CDAP UI running as PID 16308


      Note that there while there is an "Unable to determine $DRILL_HOME" error, I don't think that this should be a big problem since have added and set the explore.enabled value in the cdap-site.xml to be false.
      Looking at the cdap-site.xml, the web UI port does appear to be set to the default 11011 and yet can't see it (if only to check if the UI would tell me more about any errors) despite the fact that it reports as running.



      Checking some info about the PIDs, seeing



      # looking at the process that report to not be running
      [root@mapr007 conf.dist]# ps -p 12126
      PID TTY TIME CMD
      [root@mapr007 conf.dist]# ps -p 15789
      PID TTY TIME CMD

      # looking at the rest of the processes
      [root@mapr007 conf.dist]# ps -p 12653
      PID TTY TIME CMD
      12653 ? 00:08:12 java
      [root@mapr007 conf.dist]# ps -p 16184
      PID TTY TIME CMD
      16184 ? 00:03:02 java
      [root@mapr007 conf.dist]# ps -p 16308
      PID TTY TIME CMD
      16308 ? 00:00:01 node


      Also checked if the default security.auth.server.bind.port was being used by some other service



      root@mapr007 conf.dist]# netstat -anp | grep 10009


      but nothing detected.



      Not sure where to start debugging from here, so any suggestions or information would be appreciated.





      UPDATE



      Restarting the services to try to get more logging data, now seeing some errors (better than just it just not complaining and then not working, I guess)



      [root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i stop ; done
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Auth Server ...
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Kafka Server ....

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:30 HST 2018 Stopping CDAP Master ...
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:31 HST 2018 Stopping CDAP Router ....

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:32 HST 2018 Stopping CDAP UI ....

      [root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:41 HST 2018 Starting CDAP Auth Server service on mapr007.org.local

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:44 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:47 HST 2018 Starting CDAP Master service on mapr007.org.local

      Warning: Unable to determine $DRILL_HOME
      Mon Nov 26 11:07:17 HST 2018 Ensuring required HBase coprocessors are on HDFS
      Mon Nov 26 11:08:57 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
      [ERROR] Master startup checks failed. Please check /var/log/cdap/master-cdap-mapr007.org.local.log to address issues.
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:10:08 HST 2018 Starting CDAP Router service on mapr007.org.local

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:10:11 HST 2018 Starting CDAP UI service on mapr007.org.local


      Checking the content of the /var/log/cdap/master-cdap-mapr007.org.local.log file, at the bottom can see



      ...
      ...
      ...
      2018-11-26 11:10:06,996 - ERROR [main:c.c.c.m.s.MasterStartupTool@109] - YarnCheck failed with RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
      java.lang.RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
      at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:79) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
      at co.cask.cdap.common.startup.CheckRunner.runChecks(CheckRunner.java:51) ~[co.cask.cdap.cdap-common-5.1.0.jar:na]
      at co.cask.cdap.master.startup.MasterStartupTool.canStartMaster(MasterStartupTool.java:106) [co.cask.cdap.cdap-master-5.1.0.jar:na]
      at co.cask.cdap.master.startup.MasterStartupTool.main(MasterStartupTool.java:96) [co.cask.cdap.cdap-master-5.1.0.jar:na]
      Caused by: java.util.concurrent.TimeoutException: null
      at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[na:1.8.0_181]
      at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:76) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
      ... 3 common frames omitted
      2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@113] - Root cause: TimeoutException:
      2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@116] - Errors detected while starting up master. Please check the logs, address all errors, then try again.


      Following the "CDAP services on Distributed CDAP aren't starting up due to an exception. What should I do?" FAQ in the docs did not seem to help (https://docs.cask.co/cdap/current/en/faqs/cdap.html#cdap-services-on-distributed-cdap-aren-t-starting-up-due-to-an-exception-what-should-i-do).



      Will continue debugging, but would appreciate any opinion on these new errors.










      share|improve this question
















      After going through the docs for installing CDAP on MapR system (v6.0) and starting the cdap services, am finding that some CDAP services not running after startup (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#starting-cdap-services) despite the services' startup loop not showing any errors. The output after starting the services and checking their status is shown below:



      [root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:03:01 HST 2018 Starting CDAP Auth Server service on mapr007.org.local


      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:03:04 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local


      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:03:07 HST 2018 Starting CDAP Master service on mapr007.org.local


      Warning: Unable to determine $DRILL_HOME
      Wed Nov 21 16:03:48 HST 2018 Ensuring required HBase coprocessors are on HDFS
      Wed Nov 21 16:04:00 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:04:15 HST 2018 Starting CDAP Router service on mapr007.org.local


      /usr/bin/id: cannot find name for group ID 504
      Wed Nov 21 16:04:17 HST 2018 Starting CDAP UI service on mapr007.org.local



      [root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i status ; done
      /usr/bin/id: cannot find name for group ID 504
      PID file /var/cdap/run/auth-server-cdap.pid exists, but process 12126 does not appear to be running
      /usr/bin/id: cannot find name for group ID 504
      CDAP Kafka Server running as PID 12653
      /usr/bin/id: cannot find name for group ID 504
      PID file /var/cdap/run/master-cdap.pid exists, but process 15789 does not appear to be running
      /usr/bin/id: cannot find name for group ID 504
      CDAP Router running as PID 16184
      /usr/bin/id: cannot find name for group ID 504
      CDAP UI running as PID 16308


      Note that there while there is an "Unable to determine $DRILL_HOME" error, I don't think that this should be a big problem since have added and set the explore.enabled value in the cdap-site.xml to be false.
      Looking at the cdap-site.xml, the web UI port does appear to be set to the default 11011 and yet can't see it (if only to check if the UI would tell me more about any errors) despite the fact that it reports as running.



      Checking some info about the PIDs, seeing



      # looking at the process that report to not be running
      [root@mapr007 conf.dist]# ps -p 12126
      PID TTY TIME CMD
      [root@mapr007 conf.dist]# ps -p 15789
      PID TTY TIME CMD

      # looking at the rest of the processes
      [root@mapr007 conf.dist]# ps -p 12653
      PID TTY TIME CMD
      12653 ? 00:08:12 java
      [root@mapr007 conf.dist]# ps -p 16184
      PID TTY TIME CMD
      16184 ? 00:03:02 java
      [root@mapr007 conf.dist]# ps -p 16308
      PID TTY TIME CMD
      16308 ? 00:00:01 node


      Also checked if the default security.auth.server.bind.port was being used by some other service



      root@mapr007 conf.dist]# netstat -anp | grep 10009


      but nothing detected.



      Not sure where to start debugging from here, so any suggestions or information would be appreciated.





      UPDATE



      Restarting the services to try to get more logging data, now seeing some errors (better than just it just not complaining and then not working, I guess)



      [root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i stop ; done
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Auth Server ...
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Kafka Server ....

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:30 HST 2018 Stopping CDAP Master ...
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:31 HST 2018 Stopping CDAP Router ....

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:32 HST 2018 Stopping CDAP UI ....

      [root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:41 HST 2018 Starting CDAP Auth Server service on mapr007.org.local

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:44 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:06:47 HST 2018 Starting CDAP Master service on mapr007.org.local

      Warning: Unable to determine $DRILL_HOME
      Mon Nov 26 11:07:17 HST 2018 Ensuring required HBase coprocessors are on HDFS
      Mon Nov 26 11:08:57 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
      [ERROR] Master startup checks failed. Please check /var/log/cdap/master-cdap-mapr007.org.local.log to address issues.
      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:10:08 HST 2018 Starting CDAP Router service on mapr007.org.local

      /usr/bin/id: cannot find name for group ID 504
      Mon Nov 26 11:10:11 HST 2018 Starting CDAP UI service on mapr007.org.local


      Checking the content of the /var/log/cdap/master-cdap-mapr007.org.local.log file, at the bottom can see



      ...
      ...
      ...
      2018-11-26 11:10:06,996 - ERROR [main:c.c.c.m.s.MasterStartupTool@109] - YarnCheck failed with RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
      java.lang.RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
      at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:79) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
      at co.cask.cdap.common.startup.CheckRunner.runChecks(CheckRunner.java:51) ~[co.cask.cdap.cdap-common-5.1.0.jar:na]
      at co.cask.cdap.master.startup.MasterStartupTool.canStartMaster(MasterStartupTool.java:106) [co.cask.cdap.cdap-master-5.1.0.jar:na]
      at co.cask.cdap.master.startup.MasterStartupTool.main(MasterStartupTool.java:96) [co.cask.cdap.cdap-master-5.1.0.jar:na]
      Caused by: java.util.concurrent.TimeoutException: null
      at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[na:1.8.0_181]
      at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:76) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
      ... 3 common frames omitted
      2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@113] - Root cause: TimeoutException:
      2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@116] - Errors detected while starting up master. Please check the logs, address all errors, then try again.


      Following the "CDAP services on Distributed CDAP aren't starting up due to an exception. What should I do?" FAQ in the docs did not seem to help (https://docs.cask.co/cdap/current/en/faqs/cdap.html#cdap-services-on-distributed-cdap-aren-t-starting-up-due-to-an-exception-what-should-i-do).



      Will continue debugging, but would appreciate any opinion on these new errors.







      mapr cdap






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 27 '18 at 2:25







      lampShadesDrifter

















      asked Nov 23 '18 at 23:34









      lampShadesDrifterlampShadesDrifter

      1,0562628




      1,0562628
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Restarting Resource Manager and Node Manager services on the cluster seems to have resolved this error. This was done mostly on a guess by another dev based only on the fact that the error was related to CDAP being unable to connect to YARN despite the cluster's RM and NM services running fine.



          Furthermore, the CDAP installation docs for enabling kerberose (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#enabling-kerberos) specify using a special keyword _HOST, eg.



          <property>
          <name>cdap.master.kerberos.keytab</name>
          <value>/etc/security/keytabs/cdap.service.keytab</value>
          </property>

          <property>
          <name>cdap.master.kerberos.principal</name>
          <value><cdap-principal>/_HOST@EXAMPLE.COM</value>
          </property>


          where the _HOST is not just some doc placeholder, but is some special keyword that is supposed to automatically be filled in (eg. see https://mapr.com/docs/60/Hive/Config-HiveMetastoreForKerberos.html and https://mapr.com/docs/60/SecurityGuide/Config-YARN-Kerberos.html).



          Apparently, for MapR client nodes (ie. non control- or data-nodes (nodes simply running the MapR client package to interact with the cluster)), this does not work and the kerberos principle server host name must be explicitly given (pretty sure the docs exist, but can't find at this time). This was discovered when further examining the logs and seeing that the CDAP services where trying to connect to _HOST@us.org instead of say the.actual.domain@us.org.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53453891%2fcask-cdap-services-started-but-not-running-during-installation%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Restarting Resource Manager and Node Manager services on the cluster seems to have resolved this error. This was done mostly on a guess by another dev based only on the fact that the error was related to CDAP being unable to connect to YARN despite the cluster's RM and NM services running fine.



            Furthermore, the CDAP installation docs for enabling kerberose (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#enabling-kerberos) specify using a special keyword _HOST, eg.



            <property>
            <name>cdap.master.kerberos.keytab</name>
            <value>/etc/security/keytabs/cdap.service.keytab</value>
            </property>

            <property>
            <name>cdap.master.kerberos.principal</name>
            <value><cdap-principal>/_HOST@EXAMPLE.COM</value>
            </property>


            where the _HOST is not just some doc placeholder, but is some special keyword that is supposed to automatically be filled in (eg. see https://mapr.com/docs/60/Hive/Config-HiveMetastoreForKerberos.html and https://mapr.com/docs/60/SecurityGuide/Config-YARN-Kerberos.html).



            Apparently, for MapR client nodes (ie. non control- or data-nodes (nodes simply running the MapR client package to interact with the cluster)), this does not work and the kerberos principle server host name must be explicitly given (pretty sure the docs exist, but can't find at this time). This was discovered when further examining the logs and seeing that the CDAP services where trying to connect to _HOST@us.org instead of say the.actual.domain@us.org.






            share|improve this answer






























              0














              Restarting Resource Manager and Node Manager services on the cluster seems to have resolved this error. This was done mostly on a guess by another dev based only on the fact that the error was related to CDAP being unable to connect to YARN despite the cluster's RM and NM services running fine.



              Furthermore, the CDAP installation docs for enabling kerberose (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#enabling-kerberos) specify using a special keyword _HOST, eg.



              <property>
              <name>cdap.master.kerberos.keytab</name>
              <value>/etc/security/keytabs/cdap.service.keytab</value>
              </property>

              <property>
              <name>cdap.master.kerberos.principal</name>
              <value><cdap-principal>/_HOST@EXAMPLE.COM</value>
              </property>


              where the _HOST is not just some doc placeholder, but is some special keyword that is supposed to automatically be filled in (eg. see https://mapr.com/docs/60/Hive/Config-HiveMetastoreForKerberos.html and https://mapr.com/docs/60/SecurityGuide/Config-YARN-Kerberos.html).



              Apparently, for MapR client nodes (ie. non control- or data-nodes (nodes simply running the MapR client package to interact with the cluster)), this does not work and the kerberos principle server host name must be explicitly given (pretty sure the docs exist, but can't find at this time). This was discovered when further examining the logs and seeing that the CDAP services where trying to connect to _HOST@us.org instead of say the.actual.domain@us.org.






              share|improve this answer




























                0












                0








                0







                Restarting Resource Manager and Node Manager services on the cluster seems to have resolved this error. This was done mostly on a guess by another dev based only on the fact that the error was related to CDAP being unable to connect to YARN despite the cluster's RM and NM services running fine.



                Furthermore, the CDAP installation docs for enabling kerberose (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#enabling-kerberos) specify using a special keyword _HOST, eg.



                <property>
                <name>cdap.master.kerberos.keytab</name>
                <value>/etc/security/keytabs/cdap.service.keytab</value>
                </property>

                <property>
                <name>cdap.master.kerberos.principal</name>
                <value><cdap-principal>/_HOST@EXAMPLE.COM</value>
                </property>


                where the _HOST is not just some doc placeholder, but is some special keyword that is supposed to automatically be filled in (eg. see https://mapr.com/docs/60/Hive/Config-HiveMetastoreForKerberos.html and https://mapr.com/docs/60/SecurityGuide/Config-YARN-Kerberos.html).



                Apparently, for MapR client nodes (ie. non control- or data-nodes (nodes simply running the MapR client package to interact with the cluster)), this does not work and the kerberos principle server host name must be explicitly given (pretty sure the docs exist, but can't find at this time). This was discovered when further examining the logs and seeing that the CDAP services where trying to connect to _HOST@us.org instead of say the.actual.domain@us.org.






                share|improve this answer















                Restarting Resource Manager and Node Manager services on the cluster seems to have resolved this error. This was done mostly on a guess by another dev based only on the fact that the error was related to CDAP being unable to connect to YARN despite the cluster's RM and NM services running fine.



                Furthermore, the CDAP installation docs for enabling kerberose (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#enabling-kerberos) specify using a special keyword _HOST, eg.



                <property>
                <name>cdap.master.kerberos.keytab</name>
                <value>/etc/security/keytabs/cdap.service.keytab</value>
                </property>

                <property>
                <name>cdap.master.kerberos.principal</name>
                <value><cdap-principal>/_HOST@EXAMPLE.COM</value>
                </property>


                where the _HOST is not just some doc placeholder, but is some special keyword that is supposed to automatically be filled in (eg. see https://mapr.com/docs/60/Hive/Config-HiveMetastoreForKerberos.html and https://mapr.com/docs/60/SecurityGuide/Config-YARN-Kerberos.html).



                Apparently, for MapR client nodes (ie. non control- or data-nodes (nodes simply running the MapR client package to interact with the cluster)), this does not work and the kerberos principle server host name must be explicitly given (pretty sure the docs exist, but can't find at this time). This was discovered when further examining the logs and seeing that the CDAP services where trying to connect to _HOST@us.org instead of say the.actual.domain@us.org.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 27 '18 at 21:55

























                answered Nov 27 '18 at 2:25









                lampShadesDrifterlampShadesDrifter

                1,0562628




                1,0562628
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53453891%2fcask-cdap-services-started-but-not-running-during-installation%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Costa Masnaga

                    Fotorealismo

                    Sidney Franklin