esb-amazon-infrastructure

advertisement
Architecture
us-east-1a availability zone
us-east-1b availability zone
Elastic Load Balancer
443 CXF, Karaf web console
monitor 8101
(karaf SSH console)
ServiceMix
monitor 8101
ServiceMix
Elastic Load Balancer
443 ActiveMQ web console
61617 ActiveMQ
monitor 61616
monitor 61616
ActiveMQ
ActiveMQ
RDS Oracle
RDS Oracle
hot standby
Network Architecture
Our network architecture aims to be simple but still provide for disaster recovery and keep application
layers in separate networks.
For each stack we create a VPC and give it the entire /24 network allocated to us by Harvard UNSG.
Page 1 of 25
We break the /24 network into four /26 networks, each with a potential for 58 IP addresses. We use two
of these subnets for Oracle RDS and the other two subnets for EC2 instances. For each application layer
we put one subnet in the us-east-1a availability zone and the other in the us-east-1b availability zone.
The elastic load balancers get Harvard IP addresses from the same subnets as our EC2 instances.
Here is our TEST stack:
Internal ELB 10.39.8.22, 10.39.8.119
atsesbtestmq.cadm.harvard.edu
Internal ELB 10.39.8.54, 10.39.8.77
atsesbtest.cadm.harvard.edu
Harvard Data Center
60 Oxford St.
Direct Connect
IP 10.39.8.10
atsesbtest1.cadm.harvard.edu
IP 10.39.8.70
atsesbtest2.cadm.harvard.edu
ServiceMix
ServiceMix
ActiveMQ
ActiveMQ
subnet 10.39.8.0/26
subnet 10.39.8.64/26
LEGEND:
RDS Oracle
hot standby
RDS Oracle
VPC subnet
instance contents
security group
Availability Zone
subnet 10.39.8.128/26
subnet 10.39.8.192/26
us-east-1a availability zone
us-east-1b availability zone
TEST instance VPC 10.39.8.0/24
We have three security groups; one each for EC2 instances, RDS instances and Elastic Load Balancers.
Building Infrastructure
Create VPC
First create the VPC with the assigned CIDR. We used 10.39.8.0/24 for TEST and 10.37.8.0/24 for PROD.
Page 2 of 25
Select the new VPC and then under the Actions button click Edit DNS resolution and Edit DNS hostnames
and ensure both are set to Yes.
Create an internet gateway (Internet Gateways link on left navigation pane of VPC console) and
associate it with the VPC:
Create one or more subnets (Subnets link in left navigation pane) in the VPC, each with a smaller CIDR
and in the appropriate availability zone. For TEST we split the /24 CIDR granted by Harvard UNSG into
four subnets: 10.39.8.0/26 and 10.39.8.64/26 for TEST EC2 instances, and 10.39.8.128/26 and
10.39.8.192/26 for TEST RDS instances.
Page 3 of 25
Repeat for the other 3 subnets, making sure we have two EC2 subnets in two different availability zones
and two RDS subnets in two different availability zones (probably the same two availability zones we
used for the EC2 instances).
Add routes and subnet associations. Select the Route Tables item in the left navigation pane and
highlight the route table for your VPC. Select the Routes tab in the bottom pane, click the Edit button
and add a route for network 0.0.0.0/0 whose target is the gateway you created earlier.
Now select the Subnet Associations tab and click the Edit button. Associate your subnets with the route
table and click Save.
Page 4 of 25
Note above we've associated the EC2 subnets only, the RDS subnets don't need to access the internet
outside of Amazon so we leave them unassociated.
Leave the Network ACLs for your new subnets alone. It is tempting to use inbound rules to restrict
inbound connections, but this would filter inbound packets even for connections that were initiated
from inside the network because this firewall is stateless. For example, if I want to be able to connect to
http://google.com from within one of my EC2 instances, then google.com would have to be whitelisted
on the inbound rules for the HTTP response to make it back to me. Instead, just leave it at "allow all".
Configure Direct Connect
Alex Manoogian in the UNSG helped us get configured. We had to go to VPC panel and create a virtual
private gateway, then go to Direct Connect panel and accept the new interface. Here are screen shots:
Page 5 of 25
We then went into the Route Tables console, selected the routing for our VPC, selected the Route
Propagation tab and pressed the Edit button.
Page 6 of 25
Check the Propagate checkbox, so we are picking up routing information from Harvard.
Click Save, and select the Routes tab. You should see routing that was pushed by Harvard.
Once the routing is in place any existing SSH connections you have to EC2 instances will hang, and you
will no longer be able to create new connections to your instances. This is because a different route is
being used between your laptop and the instances, now that both have 10.0.0.0 IP addresses. You have
to open ACLs in your security group which permit access from the private IP address space for the OAS
VPN, 10.11.82.0/24.
One issue has occurred with the addition of Direct Connect. Here is an email thread describing the issue:
Page 7 of 25
Alex Manoogian and I have spent a fair amount of time experimenting with Amazon Direct
Connect as currently configured and we have run into an issue where internet-facing Load
Balancers do not respond to requests from hosts on the oasadmin VPN (or indeed from any host
with a Harvard IP address).
Alex has identified the issue as being with the Load Balancers acting as a proxy, but rewriting the
source IP address for the second (proxy) request with the originating machine's IP, rather than the
Load Balancer's IP.
A proxied request from my laptop to ServiceMix might look like the following:
Mike's laptop --> Load Balancer --> ServiceMix
Since the Load Balancer (LB) is internet-facing it has an Amazon IP address and the request from
laptop to LB travels over the public internet. The LB then makes an identical proxy request of
ServiceMix, identifying my laptop as the source of this second request. When Service Mix
responds, it thinks it should respond to the IP address of the laptop, rather than the IP address of
the ELB. However it has two routes to the laptop: 1. back the way it came, through the proxy and
the public internet, or 2. over the Direct Connect. Direct Connect appears to be the shortest route
and that is where it goes. The Load Balancer never gets a chance to receive the proxy response
and send a corresponding response to the laptop. The response transmitted over Direct Connect
arrives at the laptop, but the laptop doesn't know what to do with it because it didn't come from
the LB.
We have a few options:
1. Make our Load Balancers "Internal" rather than "Internet Facing". This means the load
balancers get Harvard IP addresses rather than Amazon IP addresses so all traffic goes over the
Direct Connect and everything works as expected. This means, however, that external
organizations would not have any access to the Load Balancers. We couldn't do integrations with
systems outside of Harvard (like vendors, etc.) without additional hardware (see #2).
2. We could make LBs internal, but add infrastructure to expose them to the public. We could set
up NATed access to the internal LB's in much the same way we expose hosts on a secure subnet at
60 Oxford St, or we could provide both internal LBs and internet facing LBs. Either option would
add cost; we'd require an extra EC2 instance to function as the NAT host, or we'd need two more
load balancers.
3. Alex suggested we could have customized route tables provided to us by the NOC. This would
mean that instead of telling our cloud how to get to Harvard as a whole, the NOC would instead
tell it how to get to each host we need to access. For example, we were able to remove the
oasadmin VPN from the list of Harvard routes pushed to our cloud, and suddenly the LBs started
working correctly from my laptop. I'm not sure this would work for all connections, because
effectively it is forcing the connection to go outside of the Direct Connect and in some cases we
Page 8 of 25
need it to go through the Direct Connect (if a database server, for example, does not have a public
IP address). Furthermore, this would be like ACL hell.
4. We could wait and see if there is a change in network strategy. There is talk of forcing all access
to Amazon, either to Harvard IPs or to Amazon IPs, through the Direct Connect, which would
mean this issue just goes away.
We are going with option #1 for the time being, and when/if we need public access then we can
hope #4 has occurred, or we can spend the money to do #2.
Configure ACLs and Default Security Group for the VPC
Now we configure ACLs and security groups for the VPC.
First, select Network ACLs in the left navigation pane of the VPC console. A default entry has been
created for your new VPC. This is just a different view into the same ACLs we saw in the Subnets page.
As before leave the ACLs wide open: do not change the inbound rules. You can click in the Name
column to rename the ACL set if you wish.
Next, select Security Groups in the left navigation pane of the VPC console. This is where we create rules
like the conventional stateful network ACLs we normally use at Harvard.
A default security group has been created for your VPC when the VPC was created; we'll use this one for
our EC2 instances (we will later additional security groups for load balancers and for RDS). Select the
default security group and rename by clicking in the Name column as before. Choose the Inbound Rules
tab and open those ports you will need access to:
Page 9 of 25
Note that in addition to the ports we are opening up to the outside world (22, 8101, 8181, 8161, 61616)
we have also granted access to all incoming traffic coming from the group itself (see red circles). This
lets EC2 instances talk to other EC2 instances, even though they may be in different subnets (i.e.
availability zones). This is particularly important once we bring elastic load balancers into play because
the balancer in one subnet might need to route traffic to an EC2 in another subnet. We could refine this
rule to particular ports (because the load balancers are only active on a small set of ports) but it doesn't
seem worth doing.
DHCP
In a VPC you can configure Amazon's DHCP to set up custom DNS and NTP servers. Go to the VPC
console and select DHCP Option Sets in the left navigation. Add a new option set as follows:
This identifies Harvard's DNS servers and the IP address of Harvard's NTP server, time.harvard.edu. Now
navigate back to Your VPCs, select the VPC of interest and under the Actions button choose Edit DHCP
Options Set.
Select the DHCP options set you just created and press Save:
Page 10 of 25
Create the Database (RDS) Instance
We already have two VPC subnets aimed at RDS usage in two different availability zones, created in the
previous section.
We took the following steps to create an RDS instance for the ATSESB pilot.
Create Security Group
In the VPC console choose Security Groups in left navigation pane, and create a VPC security group
specifically for RDS instances and associated with our VPC. We named it "esb-test-rds":
Go to the RDS console, choose Subnet Groups in the navigation pane then click Create DB Subnet Group.
Add the two subnets which are aimed at RDS to the DB subnet group.
Create Option Group
We want to encrypt all data at rest, which means we need to use Oracle EE with the TDE (transparent
data encryption) option. To enable TDE, you must create an AWS option group. Go to the RDS console
and select Option Groups in the left navigation menu. Click the Create Group button and create an
Oracle EE option group:
Page 11 of 25
Now select the new group, press the Add Option button and add the TDE option to your group:
Create RDS Instance
Now create an RDS instance. Choose Oracle EE:
Respond "Yes" to production purposes:
Page 12 of 25
Choose a smallish instance (I chose "db.m1.small") and storage (10 GB was the minimum):
Since we are deploying to Harvard IP addresses, we must set Publicly Accessible to "No" to get access to
the database using TOAD (you can't change this later). Also make sure you select your VPC, Subnet
Group, the Security Group we created earlier, and the Option Group we just created:
Page 13 of 25
Press the Create Instance button and after about 15 minutes you will have a database. Note the
connection parameters are available on the main RDS instances list:
To get access to the DB from a desktop using TOAD we have to adjust ACLs in the security group we
created at the beginning of this section. In the VPC dashboard, Security Groups, esb-test-rds group we
added inbound access to port 1521 from 10.11.82.0/24 (the oasadmin VPN), and confirmed we could
connect with telnet "telnet esb-test-amq.c6iqxbiri34d.us-east-1.rds.amazonaws.com 1521". We also
added a rule permitting the hosts in the esb-test-ec2 subnets (actually the whole 10.11.82.0/24 subnet)
to access the RDS instance:
Page 14 of 25
Finally we set up an entry in Oracle's tnsnames.ora on our desktops:
ATSESBTEST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)
(HOST = esb-test-amq.c6iqxbiri34d.us-east-1.rds.amazonaws.com)
(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = ORCL)
)
)
and were able to connect using a TOAD command line like:
"C:\Program Files (x86)\Quest Software\TOAD for Oracle 12\Toad.exe" -c
"amq/secret@ATSESBTEST"
Note the database host name "esb-test-amq.c6iqxbiri34d.us-east-1.rds.amazonaws.com" is not random.
If you delete the database and recreate with the same DB Instance Identifier, "esb-test-amq", the
resulting DB will have the same host name.
You can confirm TDE is enabled with:
SELECT * FROM v$encryption_wallet;
Now create an encrypted table space (you can't alter an existing table space and encrypt it) and set it as
the default table space for the AMQ user:
CREATE TABLESPACE CRYPT_TS ENCRYPTION USING 'AES256' DEFAULT STORAGE (ENCRYPT);
ALTER USER AMQ default tablespace CRYPT_TS quota unlimited on CRYPT_TS;
You can leave the empty and now-unused USERS table space in place; it is not consuming enough space
to worry about.
Once you start ActiveMQ you should see it creates three tables in the CRYPT_TS table space:
Page 15 of 25
Future work: once we have created the RDS database in its final incarnation we will open a ticket with
the SOC to create an onames LDAP entry for this database so that modifying our tnsnames.ora files is
not necessary.
Create EC2 Instances in the VPC
Now create an EC2 instance in the new VPC.
In the EC2 console choose the Amazon Linux AMI:
I chose m3.medium instances:
Page 16 of 25
In the next screen make sure you select the correct VPC in the Network field, and one of the subnets you
created earlier. Also be sure you assign a public IP (without a public IP you can't make outbound internet
requests from the instance). You can also specify the private (Harvard) IP address you want for the
instance. This IP will still be allocated via DHCP, but will be reserved for this instance. We've specified
private IPs so we can create DNS CNAME records. For esbtest1 we're using 10.39.8.10 and for esbtest2
we're using 10.39.8.70.
I took the default storage:
Page 17 of 25
Give the instance an appropriate name:
Select the security group you configured earlier:
Repeat the previous section and create a second EC2 instance, identical to the first, except for host
name and IP address. Make sure you create it in the other EC2 subnet so it runs in a second availability
zone.
Page 18 of 25
Create Load Balancers
Create Security Group
Create a new security group specifically for the load balancers you will be creating ,so we can tweak the
access controls independently of those for the backing EC2 instances. Name it "esb-test-elb". Open ports
443 and 61617 to the OAS VPN. I've also opened all ports to connections from our VPC (circled in red
below) for developer convenience.
Update our existing esb-test-ec2 (default) security group to permit access to the EC2 instances from the
ELB:
Page 19 of 25
Create Load Balancers
Create Load Balancers in the EC2 console. We need two load balancers: one for ServiceMix and one for
ActiveMQ.
Load Balancer
Monitors
Load Balance Ports
esbtest-smix
8101
443 -> 8181 (SSL)
8181 -> 8181
esbtest-amq
61616
443 -> 8161
61616 -> 61616
61617 -> 61616 (SSL)
To create the ActiveMQ load balancer select the Load Balancers link in the left navigation pane of the
EC2 console. Press the Create Load Balancer button.
You will select the VPC and the ports you wish to balance. In this case we chose our TEST VPC and ports
8161 and 61616, the ActiveMQ ports:
Note above that we choose to create an "Internal" load balancer. This will get a Harvard IP, so it will not
be accessible outside Harvard. We did this because of routing issues with publicly accessible load
balancers and Direct Connect.
Since one of the ports we're exposing is SSL, we are prompted for a certificate and cipher. Here we reuse
an existing certificate, but we could have chosen to enter a new one:
Page 20 of 25
You will configure how AWS will determine if a node is healthy (here we monitor port 61616):
Page 21 of 25
In the above dialog we also select settings for how often Amazon will confirm the nodes we are load
balancing are up. We have set the health check interval to 10 seconds and the two thresholds to 2,
meaning it will take Amazon up to 20 seconds to notice a node is down, or that a node has come back
up.
Next, select your subnets. Here we pick the two subnets where ActiveMQ will be running (the two
subnets we have targeted for EC2 instances):
Page 22 of 25
Assign a security group. Here we've selected the group we created specifically for load balancers:
Finally, add EC2 instances to the load balancer pool:
When you are done, the ELB will look like the following. Here we see one ActiveMQ instance in service
and the other instance out of service, which is normal for ActiveMQ configured with a shared DB:
Page 23 of 25
Next, create a second load balancer for ServiceMix. Repeat the above steps to create a second load
balancer for the same EC2 instances, but this time balancing port 8181 and monitoring port 8101.
Amazon creates a DNS "A" record for the load balancer IP address. We can create our own DNS
"CNAME" record pointing to this "A" record if we want to give the load balancer a Harvard domain
name. I've requested the Harvard network group create CNAME records for the following:.
atsesbtest.cadm.harvard.edu
 esb-test-smix-1028868478.us-east-1.elb.amazonaws.com
atsesbtestmq.cadm.harvard.edu  esb-test-amq-883256672.us-east-1.elb.amazonaws.com
When creating an internal load balancer Amazon will randomly select IP addresses from the subnets you
provide. There doesn't appear to be any way to select specific IP addresses.
Terminating SSL on a Load Balancer
As described in the previous section, we've terminated SSL on the load balancers. The ELB function as
the SSL termination point, and it can forward traffic to EC2 instances in clear text.
Here is the astsebtest.cadm.harvard.edu ELB with a port 443 listener (and an SSL certificate) and we're
forwarding to port 8181 on the EC2 instances:
Page 24 of 25
Note we have to set up an inbound rule in the Security Group to permit access from port 443.
In a similar manner we set up SSL on the atsesbtestmx.cadm.harvard.edu load balancer:
Here we've added two listeners; one on port 443 for the ActiveMQ web-based console, and one on port
61617 for the ActiveMQ broker itself.
If the application running on the EC2 instances does redirects, you have to be careful, because the
application has no knowledge it is running on port 443. For both ServiceMix and ActiveMQ this was
important. Here are a couple of URLs which did not work and the associated URLs which did:
Didn't Work
Worked
https://atsesbtest.cadm.harvard.edu/syste
m/console
https://atsesbtest.cadm.harvard.edu/system/conso
le/bundles
https://atsesbtestmq.cadm.harvard.edu/ad
min
https://atsesbtestmq.cadm.harvard.edu/admin/
Page 25 of 25
Download