[GLASSFISH-10018] domain is not cleaned up upon deleting Created: 05/Oct/09 18/Nov/09 Resolved: 18/Nov/09 Status: Project: Component/s: Affects Version/s: Fix Version/s: Resolved glassfish admin V3 Type: Reporter: Resolution: Labels: Remaining Estimate: Time Spent: Original Estimate: Environment: Bug sankarpn Fixed None Not Specified Attachments: del_dom.log 10,018 Issuezilla Id: V3 Priority: Assignee: Votes: Major Bill Shannon 0 Not Specified Not Specified Operating System: All Platform: PC Description I simply create/start/stop/delete 2 domains but in one of them the autodeploy directory is not cleaned up. Reproducing steps. -----------------asadmin create-domain --adminport 10000 --nopassword testdomain asadmin start-domain testdomain asadmin create-domain --adminport 11000 --nopassword testdomain2 asadmin start-domain testdomain2 asadmin list-domains asadmin stop-domain testdomain2 asadmin delete-domain testdomain2 Updated: asadmin stop-domain testdomain asadmin delete-domain testdomain asadmin list-domains Logs asadmin create-domain --nopassword domain1 Using port 4848 for Admin. Using default port 8080 for HTTP Instance. Using default port 7676 for JMS. Using default port 3700 for IIOP. Using default port 8181 for HTTP_SSL. Using default port 3820 for IIOP_SSL. Using default port 3920 for IIOP_MUTUALAUTH. Using default port 8686 for JMX_ADMIN. Distinguished Name of the self-signed X.509 Server Certificate is: [CN=easqeopt19,OU=GlassFish,O=Sun Microsystems,L=Santa Clara,ST=California,C=US] No domain initializers found, bypassing customization step Domain domain1 created. Domain domain1 admin port is 4848. Domain domain1 allows admin login as user "admin" with no password. Command create-domain executed successfully. 1. sh -x test + asadmin create-domain --adminport 10000 --nopassword testdomain Using port 10000 for Admin. Using default port 8080 for HTTP Instance. Using default port 7676 for JMS. Using default port 3700 for IIOP. Using default port 8181 for HTTP_SSL. Using default port 3820 for IIOP_SSL. Using default port 3920 for IIOP_MUTUALAUTH. Using default port 8686 for JMX_ADMIN. Distinguished Name of the self-signed X.509 Server Certificate is: [CN=easqeopt19,OU=GlassFish,O=Sun Microsystems,L=Santa Clara,ST=California,C=US] No domain initializers found, bypassing customization step Domain testdomain created. Domain testdomain admin port is 10000. Domain testdomain allows admin login as user "admin" with no password. Command create-domain executed successfully. + asadmin start-domain testdomain Waiting for DAS to start. ........................ Started domain: testdomain Domain location: /export/home/user/sankar/ws/v3/workspace/Glassfish_AdminCLI_Testing_Sol_10_X86/glassfishv3/glassfish/do Log file: /export/home/user/sankar/ws/v3/workspace/Glassfish_AdminCLI_Testing_Sol_10_X86/glassfishv3/glassfish/do Admin port for the domain: 10000 Command start-domain executed successfully. + asadmin create-domain --adminport 11000 --nopassword testdomain2 Using port 11000 for Admin. Default port 8080 for HTTP Instance is in use. Using 59707 Default port 7676 for JMS is in use. Using 45878 Default port 3700 for IIOP is in use. Using 61072 Default port 8181 for HTTP_SSL is in use. Using 52098 Using default port 3820 for IIOP_SSL. Using default port 3920 for IIOP_MUTUALAUTH. Default port 8686 for JMX_ADMIN is in use. Using 62084 Distinguished Name of the self-signed X.509 Server Certificate is: [CN=easqeopt19,OU=GlassFish,O=Sun Microsystems,L=Santa Clara,ST=California,C=US] No domain initializers found, bypassing customization step Domain testdomain2 created. Domain testdomain2 admin port is 11000. Domain testdomain2 allows admin login as user "admin" with no password. Command create-domain executed successfully. + asadmin start-domain testdomain2 Waiting for DAS to start. ........................ Started domain: testdomain2 Domain location: /export/home/user/sankar/ws/v3/workspace/Glassfish_AdminCLI_Testing_Sol_10_X86/glassfishv3/glassfish/do Log file: /export/home/user/sankar/ws/v3/workspace/Glassfish_AdminCLI_Testing_Sol_10_X86/glassfishv3/glassfish/do Admin port for the domain: 11000 Command start-domain executed successfully. + asadmin list-domains Name: domain1 Status: Not Running Name: testdomain2 Status: Running Name: testdomain Status: Running Command list-domains executed successfully. + asadmin stop-domain testdomain2 Waiting for the domain to stop ............. Command stop-domain executed successfully. + asadmin delete-domain testdomain2 Domain testdomain2 deleted. Command delete-domain executed successfully. + asadmin stop-domain testdomain Waiting for the domain to stop ................. Command stop-domain executed successfully. + asadmin delete-domain testdomain Domain testdomain deleted. Command delete-domain executed successfully. + asadmin list-domains Name: domain1 Status: Not Running Command list-domains executed successfully. 1. ls -l glassfishv3/glassfish/domains/ total 3 drwxr-xr-x 9 root root 10 2009-10-05 22:35 domain1 drwxr-xr-x 3 root root 3 2009-10-05 22:35 testdomain 1. ls -l glassfishv3/glassfish/domains/testdomain/ total 2 drwxr-xr-x 2 root root 2 2009-10-05 22:35 autodeploy-bundles 1. ls -l glassfishv3/glassfish/domains/testdomain/autodeploy-bundles/ total 0 Comments Comment by Bill Shannon [ 05/Oct/09 ] I can't reproduce this. Try inserting a sleep between the stop-domain and delete-domain. If that prevents you from reproducing the problem, then most likely there's a race condition. stop-domain waits until the server stops responding to the network. The server can still be running after it stops responding to the network, and that might prevent delete-domain from being able to delete all the files (although only on Windows, I believe). Comment by sankarpn [ 06/Oct/09 ] Yes adding a wait after stop-domain solves the issue, but why is it happening in recent builds ?. This is happening in solaris 10 x86 not windows. I was using the same machine for a while but I started seeing this issue only few days ago. Comment by Bill Shannon [ 06/Oct/09 ] I have no idea why it just started happening. If stop-domain got slower or delete-domain got faster, the race condition would be more likely to occur. Can you run this without the sleep and replace the failing delete-domain with "truss -t rmdir asadmin delete-domain ..."? Comment by sankarpn [ 06/Oct/09 ] Created an attachment (id=3433) truss log Comment by Bill Shannon [ 06/Oct/09 ] Did the delete-domain fail to remove all the files when you ran it under truss? It doesn't look like it failed. truss may have changed the timing just enough to avoid the race condition. It would be easy to make the delete-domain command more persistent about removing the files, perhaps trying a few times. I was hoping to see some evidence that that's the problem it's running in to, but I don't see it here. Comment by sankarpn [ 06/Oct/09 ] The delete-domain command reports success but the following directory/files are still present. I can encounter the issue only when the next create-domain fails to create the domain with the same name. So when I check the domains directory I see the following files. 1. ls -l glassfishv3/glassfish/domains/testdomain/ total 2 drwxr-xr-x 2 root root 2 2009-10-05 22:35 autodeploy-bundles 1. ls -l glassfishv3/glassfish/domains/testdomain/autodeploy-bundles/ total 0 Comment by Bill Shannon [ 06/Oct/09 ] Ok, now I'm not clear. The truss output shows that delete-domain successfully removes the domain directory. If you look at the domain directory after the delete-domain, but before you try another create-domain, is the directory gone? Also, it looks like you're running as root, which is probably not a good idea in general. If you run as another user, does the problem still occur? Is your domain directory on an NFS filesystem? Comment by sankarpn [ 06/Oct/09 ] >The truss output shows that delete-domain successfully removes the domain >directory. If you look at the domain directory after the delete-domain, >but before you try another create-domain, is the directory gone? No, the following directory is still present. 1. ls -l glassfishv3/glassfish/domains/testdomain/ total 2 drwxr-xr-x 2 root root 2 2009-10-05 22:35 autodeploy-bundles 1. ls -l glassfishv3/glassfish/domains/testdomain/autodeploy-bundles/ total 0 >Also, it looks like you're running as root, which is probably not a good >idea in general. If you run as another user, does the problem still occur? I will check it. >Is your domain directory on an NFS filesystem? No, it is a local file system Comment by Bill Shannon [ 06/Oct/09 ] The truss output clearly shows that it successfully removes the directory. I don't believe anything in the server will recreate the domain directory. Before doing the delete-domain, can you attach to the server process using truss: "truss -t mkdir -p <pid> -o /tmp/truss.out". Then run the delete-domain with truss as before. Comment by sankarpn [ 06/Oct/09 ] I attached truss to the server process before stop-domain and delete-domain since I cannot attach it after stop-domain. This is what I found /59: Received signal #13, SIGPIPE [caught] /59: Received signal #13, SIGPIPE [caught] /42: Received signal #39, SIGJVM1 [caught] /42: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1 /40: Received signal #39, SIGJVM1 [caught] /40: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1 /43: Received signal #39, SIGJVM1, in lwp_cond_wait() [caught] /43: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1 /60: Received signal #13, SIGPIPE [caught] /60: Received signal #13, SIGPIPE [caught] /39: Received signal #39, SIGJVM1 [caught] /39: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1 /53: mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/autodeplo 0777) Err#2 ENOENT /53: mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/autodeplo 0777) Err#2 ENOENT /53: mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain", 0777) = 0 /53: mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/autodeplo 0777) = 0 /53: mkdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/autodeplo 0777) = 0 /41: Received signal #39, SIGJVM1 [caught] /41: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1 /59: Received signal #13, SIGPIPE [caught] /59: Received signal #13, SIGPIPE [caught] /27: Received signal #39, SIGJVM1, in lwp_cond_wait() [caught] /27: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1 /59: Received signal #13, SIGPIPE [caught] /59: Received signal #13, SIGPIPE [caught] /59: Received signal #13, SIGPIPE [caught] /59: Received signal #13, SIGPIPE [caught] /20: Received signal #39, SIGJVM1 [caught] /20: siginfo: SIGJVM1 pid=20330 uid=102972 code=-1 Another interesting thing I found is, immediately after delete-domain the testdomain is cleaned up, but after few seconds I see that the autodeploy directory is recreated. rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/osgi-cache =0 rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib/applibs =0 rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib/databas =0 rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib/ext") =0 rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib/classes =0 rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain/lib") =0 rmdir("/export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain") =0 ls -l /export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain /export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain: No such file or directory sleep 5 ls -l /export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain total 1 drwxr-xr-x 3 sn101972 staff 3 Oct 6 12:56 autodeploy Comment by sankarpn [ 06/Oct/09 ] The last section little bit messed up. Here is the correct one. Another interesting thing I found is, immediately after delete-domain the testdomain is cleaned up, but after few seconds I see that the autodeploy directory is recreated. ls -l /export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain /export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain: No such file or directory sleep 5 ls -l /export/home/user/sn101972/ws/workspace/Testing/glassfishv3/glassfish/domains/testdomain total 1 drwxr-xr-x 3 sn101972 staff 3 Oct 6 12:56 autodeploy Comment by Bill Shannon [ 07/Oct/09 ] We're going to do several things to improve the situation here: 1. Tim is going to change the autodeploy code to not recreate the domain directory. 2. Byron is going change the server shutdown code to remove the pid file as one of the last steps before the server shuts down. 3. I'm going to change the CLI stop-domain command to wait until the pid file disappears before declaring that the server is shut down. Together I think these changes, while not a perfect solution, will sufficiently resolve the problem. Comment by Tim Quinn [ 07/Oct/09 ] Checked in fixes for auto-deployer. Author: tjquinn Date: 2009-10-08 00:47:43+0000 New Revision: 32422 Modified: trunk/v3/deployment/autodeploy/src/main/java/org/glassfish/deployment/autodeploy/AutoDeployedFilesManage trunk/v3/deployment/autodeploy/src/main/java/org/glassfish/deployment/autodeploy/AutoDeployer.java Comment by Bill Shannon [ 08/Oct/09 ] Final fixes checked in. Comment by sankarpn [ 18/Nov/09 ] verified. Generated at Mon Mar 07 08:23:34 UTC 2016 using JIRA 6.2.3#6260sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.