BKR-612 - Puppet Labs Tickets

advertisement

[BKR-612]

create_remote_file via rsync intermittently fails with "rsync error: unexplained error (code 255)"

Created: 2015/11/03 Updated: 2016/01/04

Status:

Project:

Ready for Engineering

Beaker

Component/s: None

Affects

Version/s:

BKR 2.27.0

Fix Version/s: None

Type: Bug

Reporter:

Resolution:

Labels:

Rick Bradley

Unresolved transient

Not Specified Remaining

Estimate:

Time Spent: Not Specified

Not Specified Original

Estimate:

Issue Links: Duplicate

Relates relates to relates to

BKR-

471

BKR-

614

Priority:

Assignee:

Votes: known_hosts... ssh stores fingerprint in

Normal

Rick Bradley

0

Ready for

Engineering

Transient test failure: reconnecting

...

Resolved

Template:

Epic Link: customfield_10700 true

Beaker 2016Q1

Story Points: 2

Sprint:

Description

QE 2015-11-11

This appears to be related to "host changes IP on reboot" issues like / QENG-3119. I believe the relevant error message is like the below:

 build: http://jenkins-beaker.delivery.puppetlabs.net/job/qe_beaker_intn-sys_beaker-acceptancebase-vpool/1090/ log: http://jenkins-beaker.delivery.puppetlabs.net/job/qe_beaker_intn-sys_beaker-acceptance-basevpool/1090/agent=debian7/consoleFull

Intermittent test failure! See: https://tickets.puppetlabs.com/browse/BKR-XXX

Debugging information: contents => "This is a simple text file.\n\nIt has three lines.\n" host => nthd5wil6r11a7m.delivery.puppetlabs.net remote_filename => "/tmp/.h7mTj2/testfile.txt"

remote_tmpdir => "/tmp/.h7mTj2" result => [#<Rsync::Result:0x00000002c0e4d0

@raw="@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@

WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!

@\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT

IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe fingerprint for the RSA key sent by the remote host is\n40:b9:7e:1a:a6:d1:2a:01:88:7d:28:64:61:0b:16:e4.\r\nPlease contact your system administrator.\r\nAdd correct host key in

/var/lib/jenkins/.ssh/known_hosts to get rid of this message.\r\nOffending RSA key in

/var/lib/jenkins/.ssh/known_hosts:54\r\n remove with: ssh-keygen -f

\"/var/lib/jenkins/.ssh/known_hosts\" -R 10.32.124.105\r\nPassword authentication is disabled to avoid man-in-the-middle attacks.\r\nKeyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.\r\n<f+++++++++ beaker20151103-13941mh8er1\n", @exitcode=0>, #<Rsync::Result:0x00000002c6cfd0 @raw="ssh: connect to host 10.32.112.172 port 22: Connection timed out\r\nrsync: connection unexpectedly closed (0 bytes received so far)

[sender]\nrsync error: unexplained error (code 255) at io.c(605)

[sender=3.0.9]\n", @exitcode=255>]

rsync returned #<Rsync::Result:0x00000002c6cfd0 @raw="ssh: connect to host 10.32.112.172 port 22: Connection timed out\r\nrsync: connection unexpectedly closed (0 bytes received so far)

[sender]\nrsync error: unexplained error (code 255) at io.c(605)

[sender=3.0.9]\n", @exitcode=255>

My hypothesis is that when a host changes IP on reboot, beaker's cache of host ssh keys is being used, only for rsync , in a way that aborts the rsync process due to the host key change, but which does not bubble up as a command failure.

If that's true a few things could be improved here:

 the host keys could be cleaned up on reboot, or treated as not preventing the connection (as might be happening with the rest of the SSH transport activity in beaker?) failures here could bubble up as normal Beaker command failed exceptions

Comments

Comment by Rick Bradley

[ 2015/11/03 ]

In beaker, for SSH transports, we disable strict host key checking here:

 https://github.com/puppetlabs/beaker/blob/qeng-3063/flush-output-to-prevent-reboottimeout-problems/lib/beaker/host.rb#L469-L471

# We disable prompt when host isn't known

ssh_args << "-o 'StrictHostKeyChecking no'"

Judging by threads like these (and a partial spelunk through the beaker code and rsync gem code): http://stackoverflow.com/questions/20816547/stricthostkeychecking-not-ignoringfingerprint-validation

I am going to guess that our problem is that we need to disable strict host key checking for rsync connections as well to handle this situation.

Comment by Kevin Imber

[ 2015/11/04 ]

Rick Bradley , @electrical (not getting a right JIRA name ref) is suggesting that we stop storing the signatures in the local user's known_hosts file in BKR-471 . Do you think that could be a potential solution to this issue?

It seems to me in that case the issue wouldn't even come up, but I feel like we might have session issues elsewhere where it requires us to manually approve the new signature on host creation & reboot. I'm not sure if that's even a real issue, or if it is, how hard it is to get around.

Comment by Rick Bradley

[ 2015/11/04 ]

I saw that. I'm pretty sure I've resolved the signatures problem for rspec via fixing a problem where "-o StrictHostKeyChecking no" needed to be "=no" instead. There is still another problem, which I think may be related to connections to stale IPs. Will investigate that further.

There may still be non-rsync advantages to not storing signatures like BKR-471 wants.

Comment by Rick Bradley

[ 2015/11/04 ]

Note: closing QENG-3053 as a duplicate of this, which also keeps the work here in the BKR project.

The error message there was slightly different, but I'm not convinced it's different enough to have two tickets: rsync error: error in rsync protocol data stream (code 12)

Comment by Kevin Imber

[ 2016/01/04 ]

Bulk moved this issue from beaker's 2015 Q4 epic to its 2016 Q1 epic.

If you have any questions about this, feel free to contact me.

Thanks,

Ki

Generated at Tue Feb 09 10:32:27 PST 2016 using JIRA 6.4.12#64027sha1:e3691cc1283c0f3cef6d65d3ea82d47743692b57.

Download