Payara5 systemd service going to timeout on system reboot

ERRORS:

Aug 3 07:52:49 systemd: Failed to start Payara 5 Server.
Aug 3 07:52:49 systemd: Unit payara5_supplytool.service entered failed state.
Aug 3 07:52:49 systemd: payara5_test.service failed.

Aug 2 03:29:02 ip-10-140-126-132 asadmin: Waiting for test to start …
Aug 2 03:29:02 ip-10-140-126-132 asadmin: No response from the Domain Administration Server (supplytool) after 600 seconds.
Aug 2 03:29:02 ip-10-140-126-132 asadmin: The command is either taking too long to complete or the server has failed.
Aug 2 03:29:02 ip-10-140-126-132 asadmin: Please see the server log files for command status.
Aug 2 03:29:02 ip-10-140-126-132 asadmin: Please start with the --verbose option in order to see early messages.
Aug 2 03:29:02 ip-10-140-126-132 asadmin: Command start-domain failed.

checked the logs and did not find anything wrong with domain however same ssystemd can start domain when Ec2 os up and running…

Please suggest what else i can check further

using this systemd file to start domain1 on boot

[Unit]
Description = Payara 5 Server
After = network.target syslog.target

[Service]
User = ec2-user
ExecStart = /payara5/p5_current_version/bin/asadmin start-domain domain1
ExecReload = /payara5/p5_current_version/bin/asadmin restart-domain domian1
ExecStop = /payara5/p5_current_version/bin/asadmin stop-domain domain1
TimeoutStartSec = 900
TimeoutStopSec = 30
Type=forking

[Install]
WantedBy = multi-user.target

Hi @marif,

The server.log should contain some information about what the server was doing before there was a timeout. You stated this works as expected when the EC2 instance is up and running, if your DAS is on that instance, then it would be required to be running for remote nodes and instances to start successfully.

Are you trying to start a remote node or instance without the DAS running and available for nodes to connect to?

Thanks,
James

DAS is not on.
All are local instances.

here are some logs from start

[2022-08-10T07:21:01.701+0200] [Payara 5.2022.2] [INFO] [NCLS-LOGGING-00009] [javax.enterprise.logging] [tid: _ThreadID=46 _ThreadName=RunLevelControllerThread-1660108859264] [timeMillis: 1660108861701] [levelValue: 800] [[
Running Payara Version: Payara Server 5.2022.2 #badassfish (build 306)]]

[2022-08-10T07:21:02.001+0200] [Payara 5.2022.2] [INFO] [NCLS-LOGGING-00010] [javax.enterprise.logging] [tid: _ThreadID=46 _ThreadName=RunLevelControllerThread-1660108859264] [timeMillis: 1660108862001] [levelValue: 800] [[
Server log file is using Formatter class: com.sun.enterprise.server.logging.ODLLogFormatter]]

[2022-08-10T07:21:13.677+0200] [Payara 5.2022.2] [INFO] [org.glassfish.ha.store.spi.BackingStoreFactoryRegistry] [tid: _ThreadID=48 _ThreadName=RunLevelControllerThread-1660108859452] [timeMillis: 1660108873677] [levelValue: 800] [[
Registered fish.payara.ha.hazelcast.store.HazelcastBackingStoreFactoryProxy for persistence-type = hazelcast in BackingStoreFactoryRegistry]]

[2022-08-10T07:22:00.569+0200] [Payara 5.2022.2] [INFO] [fish.payara.nucleus.hazelcast.HazelcastCore] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108920569] [levelValue: 800] [[
Hazelcast Instance Bound to JNDI at payara/Hazelcast]]

[2022-08-10T07:22:00.570+0200] [Payara 5.2022.2] [INFO] [fish.payara.nucleus.hazelcast.HazelcastCore] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108920570] [levelValue: 800] [[
JSR107 Caching Provider Bound to JNDI at payara/CachingProvider]]

[2022-08-10T07:22:00.570+0200] [Payara 5.2022.2] [INFO] [fish.payara.nucleus.hazelcast.HazelcastCore] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108920570] [levelValue: 800] [[
JSR107 Default Cache Manager Bound to JNDI at payara/CacheManager]]

[2022-08-10T07:22:06.738+0200] [Payara 5.2022.2] [INFO] [NCLS-CORE-00101] [javax.enterprise.system.core] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108926738] [levelValue: 800] [[
Network Listener http-listener-1 started in: 788ms - bound to [/0.0.0.0:12580]]]

[2022-08-10T07:22:06.736+0200] [Payara 5.2022.2] [INFO] [org.hibernate.validator.internal.util.Version] [tid: _ThreadID=48 _ThreadName=RunLevelControllerThread-1660108859452] [timeMillis: 1660108926736] [levelValue: 800] [[
HV000001: Hibernate Validator 6.1.5.Final]]

[2022-08-10T07:22:08.904+0200] [Payara 5.2022.2] [INFO] [NCLS-CORE-00101] [javax.enterprise.system.core] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108928904] [levelValue: 800] [[
Network Listener http-listener-2 started in: 537ms - bound to [/0.0.0.0:12581]]]

[2022-08-10T07:22:09.259+0200] [Payara 5.2022.2] [WARNING] [org.glassfish.grizzly.config.GenericGrizzlyListener] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108929259] [levelValue: 900] [[
HTTP/2 (enabled by default) is unsupported with port unification and will be disabled for network listener admin-listener.]]

[2022-08-10T07:22:10.608+0200] [Payara 5.2022.2] [INFO] [NCLS-CORE-00101] [javax.enterprise.system.core] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108930608] [levelValue: 800] [[
Network Listener admin-listener started in: 784ms - bound to [/0.0.0.0:12582]]]

[2022-08-10T07:22:10.610+0200] [Payara 5.2022.2] [INFO] [NCLS-CORE-00087] [javax.enterprise.system.core] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108930610] [levelValue: 800] [[
Grizzly 2.4.4 started in: 59301ms - bound to [http-listener-1:12580, http-listener-2:12581, admin-listener:12582]]]

[2022-08-10T07:22:16.542+0200] [Payara 5.2022.2] [INFO] [NCLS-CORE-00101] [javax.enterprise.system.core] [tid: _ThreadID=47 _ThreadName=RunLevelControllerThread-1660108859367] [timeMillis: 1660108936542] [levelValue: 800] [[
Network Listener iiop-service started in:

ound to [/0.0.0.0:12583]]]

[2022-08-10T07:22:58.123+0200] [Payara 5.2022.2] [INFO] [NCLS-CORE-00022] [javax.enterprise.system.core] [tid: _ThreadID=46 _ThreadName=RunLevelControllerThread-1660108859264] [timeMillis: 1660108978123] [levelValue: 800] [[
Loading application activemq-rar done in 56138 ms]]

[2022-08-10T07:24:00.209+0200] [Payara 5.2022.2] [INFO] [javax.enterprise.resource.corba.org.glassfish.enterprise.iiop.api] [tid: _ThreadID=46 _ThreadName=RunLevelControllerThread-1660108859264] [timeMillis: 1660109040209] [levelValue: 800] [[
GlassFishORBFactory service initialized.]]

As you can see in the logs it has taken around 4 mins to just initialized the domain.

and domain fails while on starting on boot…

Aug 11 11:01:26 asadmin: Waiting for plmuserform to start …
Aug 11 11:01:26 asadmin: No response from the Domain Administration Server (domain) after 600 seconds.
Aug 11 11:01:26 asadmin: The command is either taking too long to complete or the server has failed.
Aug 11 11:01:26 asadmin: Please see the server log files for command status.
Aug 11 11:01:26 asadmin: Please start with the --verbose option in order to see early messages.
Aug 11 11:01:26 asadmin: Command start-domain failed.

It seems payara5 process are taking too much CPU while on boot

3331 ec2-user 20 0 11.9g 1.6g 18188 S 79.4 1.3 2:06.11 java
5482 ec2-user 20 0 11.9g 782568 27096 S 71.2 0.6 1:30.18 java
4024 ec2-user 20 0 11.9g 1.5g 18932 S 67.6 1.3 2:15.79 java
5026 ec2-user 20 0 8066616 631644 30428 S 62.7 0.5 1:28.52 java
3039 ec2-user 20 0 11.9g 1.8g 18192 S 61.8 1.5 2:09.35 java
4992 ec2-user 20 0 10.3g 600420 27440 S 61.8 0.5 1:28.26 java
3309 ec2-user 20 0 11.9g 1.7g 18876 S 59.2 1.4 2:03.54 java
5064 ec2-user 20 0 7741948 851756 32324 S 57.5 0.7 1:28.23 java
3457 ec2-user 20 0 12.0g 2.1g 18500 S 53.6 1.8 2:02.10 java
4588 ec2-user 20 0 12.0g 2.1g 18792 S 52.6 1.7 2:10.64 java
5623 ec2-user 20 0 9708508 758432 30564 S 52.3 0.6 1:28.63 java
3229 ec2-user 20 0 12.0g 2.1g 18396 S 49.7 1.8 2:09.73 java
3230 ec2-user 20 0 12.0g 2.1g 18348 S 48.0 1.7 2:09.94 java
5303 ec2-user 20 0 7608848 664340 30432 S 47.1 0.5 1:23.38 java
5298 ec2-user 20 0 10.3g 767380 30720 S 45.1 0.6 1:16.71 java
4573 ec2-user 20 0 11.9g 2.0g 18368 S 44.1 1.7 2:04.12 java
3652 ec2-user 20 0 11.9g 1.6g 18416 S 38.9 1.4 1:53.46 java
3398 ec2-user 20 0 11.9g 2.0g 18144 S 36.9 1.7 2:18.16 java
5533 ec2-user 20 0 10.3g 582620 26492 S 35.6 0.5 1:25.58 java
5924 ec2-user 20 0 12.3g 2.4g 31736 S 35.3 2.0 1:13.29 java
5505 ec2-user 20 0 13.4g 1.3g 32664 S 34.6 1.1 1:23.85 java
5920 ec2-user 20 0 6319984 546004 25712 S 32.7 0.4 1:23.19 java
3278 ec2-user 20 0 12.0g 2.2g 18264 S 28.4 1.8 2:07.99 java
5526 ec2-user 20 0 9819.7m 963620 27688 S 28.4 0.8 1:21.47 java
3324 ec2-user 20 0 12.0g 2.1g 18388 S 24.2 1.8 2:03.14 java
5785 ec2-user 20 0 9168960 512692 27588 S 22.2 0.4 1:19.75 java
5075 ec2-user 20 0 9909256 628936 26572 S 21.6 0.5 1:19.20 java
5166 ec2-user 20 0 10.3g 560924 27168 S 21.2 0.4 1:27.51 java

ec2-user@ip-10-140-126-132 ~]$ uptime
16:06:02 up 8 min, 1 user, load average: 141.98, 134.78, 71.76

[ec2-user@ip-10-140-126-132 ~]$ uptime
16:01:25 up 3 min, 1 user, load average: 211.51, 107.97, 42.73

My EC2 has 16 vCPU.

Server.log is not showing much. asadmin try to start application on boot via systemd and after 600s goes to timeout and kill the process.

Hi @marif,
if you run simple local instance, it is DAS itself.

I modified my systemd service to contain the keys you use and it is starting correctly.
I compared the times and yours are insanly high.
E.g. starting http-listener takes 800 ms, while I have 8 ms, similarly Hazelcast (58 s vs 5 s). Mainly the stuff opening ports are very slow.

The output of top is strange – I see only one Payara process. How many apps do you have deployed on the server? And how many domains are you starting at once?

Could you try to start the server/domain only standalone without any settings or applications (I noticed you use activemq)? This would give us some baseline.

Petr