Word - Java.net

advertisement
[GLASSFISH-10018] domain is not cleaned up upon deleting Created: 05/Oct/09
18/Nov/09 Resolved: 18/Nov/09
Status:
Project:
Component/s:
Affects
Version/s:
Fix Version/s:
Resolved
glassfish
admin
V3
Type:
Reporter:
Resolution:
Labels:
Remaining
Estimate:
Time Spent:
Original
Estimate:
Environment:
Bug
sankarpn
Fixed
None
Not Specified
Attachments:
del_dom.log
10,018
Issuezilla Id:
V3
Priority:
Assignee:
Votes:
Major
Bill Shannon
0
Not Specified
Not Specified
Operating System: All
Platform: PC
Description
I simply create/start/stop/delete 2 domains but in one of them the autodeploy
directory is not cleaned up.
Reproducing steps.
-----------------asadmin create-domain --adminport 10000 --nopassword testdomain
asadmin start-domain testdomain
asadmin create-domain --adminport 11000 --nopassword testdomain2
asadmin start-domain testdomain2
asadmin list-domains
asadmin stop-domain testdomain2
asadmin delete-domain testdomain2
Updated:
asadmin stop-domain testdomain
asadmin delete-domain testdomain
asadmin list-domains
Logs
asadmin create-domain --nopassword domain1
Using port 4848 for Admin.
Using default port 8080 for HTTP Instance.
Using default port 7676 for JMS.
Using default port 3700 for IIOP.
Using default port 8181 for HTTP_SSL.
Using default port 3820 for IIOP_SSL.
Using default port 3920 for IIOP_MUTUALAUTH.
Using default port 8686 for JMX_ADMIN.
Distinguished Name of the self-signed X.509 Server Certificate is:
[CN=easqeopt19,OU=GlassFish,O=Sun Microsystems,L=Santa Clara,ST=California,C=US]
No domain initializers found, bypassing customization step
Domain domain1 created.
Domain domain1 admin port is 4848.
Domain domain1 allows admin login as user "admin" with no password.
Command create-domain executed successfully.
1. sh -x test
+ asadmin create-domain --adminport 10000 --nopassword testdomain
Using port 10000 for Admin.
Using default port 8080 for HTTP Instance.
Using default port 7676 for JMS.
Using default port 3700 for IIOP.
Using default port 8181 for HTTP_SSL.
Using default port 3820 for IIOP_SSL.
Using default port 3920 for IIOP_MUTUALAUTH.
Using default port 8686 for JMX_ADMIN.
Distinguished Name of the self-signed X.509 Server Certificate is:
[CN=easqeopt19,OU=GlassFish,O=Sun Microsystems,L=Santa Clara,ST=California,C=US]
No domain initializers found, bypassing customization step
Domain testdomain created.
Domain testdomain admin port is 10000.
Domain testdomain allows admin login as user "admin" with no password.
Command create-domain executed successfully.
+ asadmin start-domain testdomain
Waiting for DAS to start. ........................
Started domain: testdomain
Domain location:
/export/home/user/sankar/ws/v3/workspace/Glassfish_AdminCLI_Testing_Sol_10_X86/glassfishv3/glassfish/do
Log file:
/export/home/user/sankar/ws/v3/workspace/Glassfish_AdminCLI_Testing_Sol_10_X86/glassfishv3/glassfish/do
Admin port for the domain: 10000
Command start-domain executed successfully.
+ asadmin create-domain --adminport 11000 --nopassword testdomain2
Using port 11000 for Admin.
Default port 8080 for HTTP Instance is in use. Using 59707
Default port 7676 for JMS is in use. Using 45878
Default port 3700 for IIOP is in use. Using 61072
Default port 8181 for HTTP_SSL is in use. Using 52098
Using default port 3820 for IIOP_SSL.
Using default port 3920 for IIOP_MUTUALAUTH.
Default port 8686 for JMX_ADMIN is in use. Using 62084
Distinguished Name of the self-signed X.509 Server Certificate is:
[CN=easqeopt19,OU=GlassFish,O=Sun Microsystems,L=Santa Clara,ST=California,C=US]
No domain initializers found, bypassing customization step
Domain testdomain2 created.
Domain testdomain2 admin port is 11000.
Domain testdomain2 allows admin login as user "admin" with no password.
Command create-domain executed successfully.
+ asadmin start-domain testdomain2
Waiting for DAS to start. ........................
Started domain: testdomain2
Domain location:
/export/home/user/sankar/ws/v3/workspace/Glassfish_AdminCLI_Testing_Sol_10_X86/glassfishv3/glassfish/do
Log file:
/export/home/user/sankar/ws/v3/workspace/Glassfish_AdminCLI_Testing_Sol_10_X86/glassfishv3/glassfish/do
Admin port for the domain: 11000
Command start-domain executed successfully.
+ asadmin list-domains
Name: domain1 Status: Not Running
Name: testdomain2 Status: Running
Name: testdomain Status: Running
Command list-domains executed successfully.
+ asadmin stop-domain testdomain2
Waiting for the domain to stop .............
Command stop-domain executed successfully.
+ asadmin delete-domain testdomain2
Domain testdomain2 deleted.
Command delete-domain executed successfully.
+ asadmin stop-domain testdomain
Waiting for the domain to stop .................
Command stop-domain executed successfully.
+ asadmin delete-domain testdomain
Domain testdomain deleted.
Command delete-domain executed successfully.
+ asadmin list-domains
Name: domain1 Status: Not Running
Command list-domains executed successfully.
1. ls -l glassfishv3/glassfish/domains/
total 3
drwxr-xr-x 9 root root 10 2009-10-05 22:35 domain1
drwxr-xr-x 3 root root 3 2009-10-05 22:35 testdomain
1. ls -l glassfishv3/glassfish/domains/testdomain/
total 2
drwxr-xr-x 2 root root 2 2009-10-05 22:35 autodeploy-bundles
1. ls -l glassfishv3/glassfish/domains/testdomain/autodeploy-bundles/
total 0
Comments
Comment by Bill Shannon [ 05/Oct/09 ]
I can't reproduce this.
Try inserting a sleep between the stop-domain and delete-domain.
If that prevents you from reproducing the problem, then most likely
there's a race condition. stop-domain waits until the server stops
responding to the network. The server can still be running after
it stops responding to the network, and that might prevent delete-domain
from being able to delete all the files (although only on Windows, I
believe).
Comment by sankarpn [ 06/Oct/09 ]
Yes adding a wait after stop-domain solves the issue, but why is it happening in
recent builds ?.
This is happening in solaris 10 x86 not windows.
I was using the same machine for a while but I started seeing this issue only
few days ago.
Comment by Bill Shannon [ 06/Oct/09 ]
I have no idea why it just started happening. If stop-domain got slower
or delete-domain got faster, the race condition would be more likely to
occur.
Can you run this without the sleep and replace the failing delete-domain
with "truss -t rmdir asadmin delete-domain ..."?
Comment by sankarpn [ 06/Oct/09 ]
Created an attachment (id=3433)
truss log
Comment by Bill Shannon [ 06/Oct/09 ]
Did the delete-domain fail to remove all the files when you ran it under
truss? It doesn't look like it failed. truss may have changed the timing
just enough to avoid the race condition.
It would be easy to make the delete-domain command more persistent about
removing the files, perhaps trying a few times. I was hoping to see some
evidence that that's the problem it's running in to, but I don't see it
here.
Comment by sankarpn [ 06/Oct/09 ]
The delete-domain command reports success but the following directory/files are
still present.
I can encounter the issue only when the next create-domain fails to create the
domain with the same name. So when I check the domains directory I see the
following files.
1. ls -l glassfishv3/glassfish/domains/testdomain/
total 2
drwxr-xr-x 2 root root 2 2009-10-05 22:35 autodeploy-bundles
1. ls -l glassfishv3/glassfish/domains/testdomain/autodeploy-bundles/
total 0
Comment by Bill Shannon [ 06/Oct/09 ]
Ok, now I'm not clear.
The truss output shows that delete-domain successfully removes the domain
directory. If you look at the domain directory after the delete-domain,
but before you try another create-domain, is the directory gone?
Also, it looks like you're running as root, which is probably not a good
idea in general. If you run as another user, does the problem still occur?
Is your domain directory on an NFS filesystem?
Comment by sankarpn [ 06/Oct/09 ]
>The truss output shows that delete-domain successfully removes the domain
>directory. If you look at the domain directory after the delete-domain,
>but before you try another create-domain, is the directory gone?
No, the following directory is still present.
1. ls -l glassfishv3/glassfish/domains/testdomain/
total 2
drwxr-xr-x 2 root root 2 2009-10-05 22:35 autodeploy-bundles
1. ls -l glassfishv3/glassfish/domains/testdomain/autodeploy-bundles/
total 0
>Also, it looks like you're running as root, which is probably not a good
>idea in general. If you run as another user, does the problem still occur?
I will check it.
>Is your domain directory on an NFS filesystem?
No, it is a local file system
Comment by Bill Shannon [ 06/Oct/09 ]
The truss output clearly shows that it successfully removes the directory.
I don't believe anything in the server will recreate the domain directory.
Before doing the delete-domain, can you attach to the server process using
truss: "truss -t mkdir -p <pid> -o /tmp/truss.out". Then run the delete-domain
with truss as before.
Comment by sankarpn [ 06/Oct/09 ]
I attached truss to the server process before stop-domain and delete-domain
since I cannot attach it after stop-domain. This is what I found
/59: Received signal #13, SIGPIPE [caught]
/59: Received signal #13, SIGPIPE [caught]
/42: Received signal #39, SIGJVM1 [caught]
/42: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1
/40: Received signal #39, SIGJVM1 [caught]
/40: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1
/43: Received signal #39, SIGJVM1, in lwp_cond_wait() [caught]
/43: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1
/60: Received signal #13, SIGPIPE [caught]
/60: Received signal #13, SIGPIPE [caught]
/39: Received signal #39, SIGJVM1 [caught]
/39: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1
/53:
mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/autodeplo
0777) Err#2 ENOENT
/53:
mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/autodeplo
0777) Err#2 ENOENT
/53:
mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain",
0777) = 0
/53:
mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/autodeplo
0777) = 0
/53:
mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/autodeplo
0777) = 0
/41: Received signal #39, SIGJVM1 [caught]
/41: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1
/59: Received signal #13, SIGPIPE [caught]
/59: Received signal #13, SIGPIPE [caught]
/27: Received signal #39, SIGJVM1, in lwp_cond_wait() [caught]
/27: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1
/59: Received signal #13, SIGPIPE [caught]
/59: Received signal #13, SIGPIPE [caught]
/59: Received signal #13, SIGPIPE [caught]
/59: Received signal #13, SIGPIPE [caught]
/20: Received signal #39, SIGJVM1 [caught]
/20: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1
Another interesting thing I found is, immediately after delete-domain the
testdomain is cleaned up, but after few seconds I see that the autodeploy
directory is recreated.
rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/osgi-cache
=0
rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib/applibs
=0
rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib/databas
=0
rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib/ext")
=0
rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib/classes
=0
rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib")
=0
rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain")
=0
ls -l
/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain
/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain:
No such file or directory
sleep 5
ls -l
/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain
total 1
drwxr-xr-x 3 sn101972 staff 3 Oct 6 12:56 autodeploy
Comment by sankarpn [ 06/Oct/09 ]
The last section little bit messed up. Here is the correct one.
Another interesting thing I found is, immediately after delete-domain the
testdomain is cleaned up, but after few seconds I see that the autodeploy
directory is recreated.
ls -l
/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain
/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain:
No such file or directory
sleep 5
ls -l
/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain
total 1
drwxr-xr-x 3 sn101972 staff 3 Oct 6 12:56 autodeploy
Comment by Bill Shannon [ 07/Oct/09 ]
We're going to do several things to improve the situation here:
1. Tim is going to change the autodeploy code to not recreate the domain
directory.
2. Byron is going change the server shutdown code to remove the pid file
as one of the last steps before the server shuts down.
3. I'm going to change the CLI stop-domain command to wait until the pid file
disappears before declaring that the server is shut down.
Together I think these changes, while not a perfect solution, will sufficiently
resolve the problem.
Comment by Tim Quinn [ 07/Oct/09 ]
Checked in fixes for auto-deployer.
Author: tjquinn
Date: 2009-10-08 00:47:43+0000
New Revision: 32422
Modified:
trunk/v3/deployment/autodeploy/src/main/java/org/glassfish/deployment/autodeploy/AutoDeployedFilesManage
trunk/v3/deployment/autodeploy/src/main/java/org/glassfish/deployment/autodeploy/AutoDeployer.java
Comment by Bill Shannon [ 08/Oct/09 ]
Final fixes checked in.
Comment by sankarpn [ 18/Nov/09 ]
verified.
Generated at Mon Mar 07 08:23:34 UTC 2016 using JIRA 6.2.3#6260sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.
Download