KM3Net Electronics Production Readiness Review Reviewers: Auguste Le Van Suu, Alberto Valero, Ruud Kluit, Frederic Louis TABLE OF CONTENTS 1. GENERAL REMARKS .................................................................................................................... 1 2. POWER BOARD (PB) ................................................................................................................... 2 3. CENTRAL LOGIC BOARD (CLB) ................................................................................................ 3 4. OCTOPUSSES .................................................................................................................................. 5 5. NANOBEACON ................................................................................................................................ 5 6. CONCLUSIONS ................................................................................................................................ 7 1. General remarks Discussion during the session 1.- Make Complete functional test after burn-in tests 2.- Can manufacturing tests be added to the production procedure? Ans: The test will include: a.- Manufacturing tests: Automatic visual inspection (AOI preferable 3D), JTAG connectivity for CLB b.- Functional test, digital & analog values compared to upper/lower limits. c.- Burn-in with functional tests. Test before and after burn-in, or at least limited test monitoring power consumption during burn-in. Also it has been suggested by the reviewer to repeat functional test after reception on KM3NeT (verify test by manufacturer, e.g. on samples) 3.- What is the cooling power of the DOM? Ans: To be asked inside KM3NeT and distributed to the reviewers. 4.- Define clear general manufacturing procedures: -To perform pre-series before mass production -To audit companies to check manufacturing process, assessment on quality. - The manufacturer should verify all component replacements (i.e. due to long lead times or obsolesce) with the orderer. - Use ODB++ data for PCBA data transfer to manufacturer (not GERBER). 5.- What is the power margin on the DOM? Ans: There is 10 % of margin (for normal operation). Then the limit of power will be specified by the fuse at the 400V-12 V DC/DC converter 6.- To perform FIDES analysis has helped to have an improved version of the board design (ie: finding underrating capacitors, etc). This should be done for all board design. 7.- A manufacturing process document has to be prepared or updated for all the boards, see item 4. (ie: KM3NeT_ELEC_PRR_NANOBEACON_2014_007_Manufacturing_Procedure) 2. Power board (PB) General remarks Design Modifications have been performed to adapt the ramp up voltage to the Xilinx specifications. Realization/Production PCB It would be fine to apply the IPC-A-600 rules level 2 minimum to realize the reception of the PCB production. Realization/Production assembly components As already mentioned, a particular attention must be about the DC active (VSAD) modules. The soldering of these components is very delicate and has to be realized with high attention. It would be fine to apply the IPC-A-610 rules level 2 minimum for the reception of the soldering components production. Realization/Production test benches For the mass production, an industrial test bench has to be realized to test the power board in charge and check the good ramp up of the different voltages (no indication in the documentation if already realized) A burn-in operation would be performed. Discussion during the session 1.- Verify proper operation of the PB connected with all the active boards on (ie, PMTs, piezo, Nanobeacon, etc) 2.- PMT connected have monitoring of current and voltage? 3.- EMI on the PMT due to the PB? 4.- Burn-in test of the CLB and the PB together would be preferable. However, as there is no thermal management in this configuration which could stress tested boards, individual boards burn-in tests would be enough to sort out infantmortality. 5.- Fuse on the Power Board? Ans: The fuse in on the 400-12V DC/DC converter 6.- Can ripple on the 400-12 DC/DC affect operation of the Power board? Please Verify. 7.- Hysteresis: Put a real measurement of the hysteresis in the test report. 8.- Test several PBs (with active realistic load, the use of resistive load is not enough) in with the 400-12 DC/DCs and the DU base AD/DC using long cables (as used in VEOC) Use Catania & Nikhef 400/12V DC/DC. Recommended to do before mass production. 9.- Describe in a document the values that has to be measured and the thresholds of the measurement to validate the test done (i.e. voltage levels, max current behavior, startup time ranges, dissipation, etc.) Add this specific test-spec to the general manufacturing requirements. Describe required data format of the returned test results. 3. Central Logic Board (CLB) General remarks Design A particular attention has to be done on the initialization phase. Lots of problems occur during this phase and unfortunately, it is only with an important number of cycles on many boards that the design will be validated. The reconfiguration of the firmware is a very fine facility. The implementation of White Rabbit (WR) for the clock synchronization is an impressive design. The test shows a very fine synchronization result. Lot of tests has been performed on the CLB (power initialization, clock synchronization, thermal, EMC and WR synchronization). However, it seems that only one board has been evaluated. This point is particularly important especially because the WR distribution is in a broadcast mode in one direction. A test with more boards will be needed to completely validate this WR distribution and estimate the effect of the broadcast on the resynchronization process. It will also check that the clock synchronization on all boards will be realized with the intended requirements in real conditions (with full PMT data and instrumentation rates). Realization/Production PCB As mentioned, the IPC-A-600 rules level 2 minimum to realize the reception of the PCB production will be used. Realization/Production assembly components As mentioned, the IPC-A-610 rules level 2 minimum for the reception of the soldering components production will be used and an X Ray inspection realized. Realization/Production test benches For the mass production, an industrial test bench has to be realized to test all the functionalities of the CLB. A test bench for this board is a very complex development (hardware and software) and will need lot of time and human resource. Also, it seems to be very difficult to ask for a customer to provide such a test bench. A burn-in operation would be performed. A particular attention have to be raised on the thermal dissipation of the FPGA and the SFP module of the CLB. For the PPM DOM, a mechanical piece was used to transfer the FPGA dissipation to the main mechanical support of the DOM. It would be useful to study such radiator to eliminate these hot points. Discussion during the session 1.- Repeat the test with the final structure of WR, with the final number of layers (switches), similar fiber length and optics elements and with several CLBs (ie: 3-5). Duration of at least 48 hours. Data rate at worst KM3NeT case. Check that there is no loss of UDP packages. 2.- Which is the soldering process used? Ans: A lead-free process. Xilinx provides lead-free FPGAs and recommends to use lead free paste. The ones used will be the one specified by Xilinx. 3.- Manufacturing and functional test to perform after production: before and after burn-in tests? Ans: - X-Ray tests on BGA components (FPGA and FMC connector) - Automatic visual inspection (AOI) -JTAG tests - Loopback test on connectors (Octopuss and FMC) - MGTX voltages test - Flash memory test - Quick functional test to check White Rabbit Test with Supply voltage range (min, typ & max values, should match PB specifications) 4.- Is a test foreseen of all the DOM electronics ensemble together? Ans: During integration of DOMs. As M.Circila explained, DU-1 will be the first complete integration test for Km3Net, but mass production of all components need to start before this, in order to meet the project deadline. 5.- Recommend to not change the golden image. If it is not completely safe. (see 8.) 6.- Temperature on the CLB? Ans: less than 40 degrees for the FPGA and the SFP without cooling. But the FPGA and the SFP will be coupled to the cooling system. 7. Does piezo noise affects the PMTs? Ans: probably not but if it affects the PMTs the piezo will be turn off (with any EMI emission after the power off) Remark : Are Piezo devices mandatory (for the overall system) in every DOMs, which is the minimum number on one string ? 8.- Test the multiboot procedure integrated in the complete firmware of the CLB and using optical network. This should be demonstrated to work 100% safe before start of production. 9.- EMI test has been done. 10.- Start up procedure of the CLB (DOM)? Recommended to perform integration tests before mass production. Ans: The power sequence on the DOM has been defined and provided to the power group in KM3NeT. It has to be added to the documentation (TDR). Also the boot procedure has to be described on the documentation (TDR). 4. Octopusses General remarks Design It seems that no major modification has been applied on these boards. Lot of efforts have been done to improve the design of the boards for the production phase (PCB reviews and studies to optimize the test coverage) Realization/Production PCB The production will be realized in two levels: 50 and 635 pieces. The presence of a serial number on the PCB is a key point to have a good traceability for the production. It would be fine to apply the IPC-A-600 rules level 2 minimum to realize the reception of the PCB production. Realization/Production assembly components As mentioned, the IPC-A-610 rules level 2 minimum for the reception of the soldering components production will be used. Realization/Production test benches An industrial test bench has been developed to completely check the two types of boards. A burn-in operation would be performed. 5. Nanobeacon General remarks Design To prevent bad connection, would it be possible to remove the 3 pins input connector and directly solder the cable on the PCB. Realization/Production PCB The presence of a serial number on the PCB is a key point to have a good traceability for the production. As mentioned, the IPC-A-600 rules level 2 minimum to realize the reception of the PCB production will be used. Realization/Production assembly components It would be fine to apply the IPC-A-610 rules level 2 minimum for the reception of the soldering components production. Realization/Production test benches A test bench has been developed to completely check the Nanobeacon board. Lot of informations will be stored in a database concerning the production. A particular attention has been done to quality assurance dedicated to this production. A burn-in operation would be performed. Questions: relevant only if the Nanobeacon circuit is inserted in a PVC mechanics for the final integration The Nanobeacon circuit will be tested in the PVC mechanic of the test bench and removed after this operation. Is there one another test intended when the Nanobeacon circuit will be inserted in its final mechanics (to check the covering of the cable hole)? Discussion during the session 1.- The Nanobeacon flashing can add EMI on the PMTs? Consider using shielded cable to Nanobeacon. Ans: We have the experience of Antares and probably it is negligible. In any case it will be tested. 2.- What is the reference of the Nanobeacon? Ans: It is not the BOM table on the Nanobeacon PRR document. 3.- What is the range of operation of the Nanobeacon to be tested? Ans: From 1kHz to 20 kHz. The documentation has to be updated. 4.- What happens if the Nanobeacon short cuts? (E.g. at end-of-life) Ans: Should be tested. 5.- How the Nanobeacon can be disconnected/disabled? Ans: By removing the Trigger or the Power Supply. 6.- What is light emitting power of the Nanobacon? Apply HALT, to estimate lifetime. Ans: We have perform test in Antares that shown that with a single led it can be reached up to 300 meters. We expect a similar behavior in KM3NeT. 7.- Consider additional glue for the LED on the PCB, for robustness during handling before and during installation. 6. Conclusions The results presented during the review for all the different parts of the system are very impressive and show a well advanced status of the project. Significant progress in the development of all individual components has been demonstrated. The modules are developed, tested under various conditions and checked for manufacturability and reliability. However, the tests performed so far with the prototypes of each module do not guarantee the same behavior in the production boards, especially considering that new companies have been selected for the mass production. Thus, a pre-series production and thoughtfully tests are highly recommended to corroborate the results obtained with the prototypes before the mass production is launched. A general recommendation is that the preparation for production needs more attention: preparing detailed test specifications, or develop well defined test set-ups that can be used by the manufacturers. These test-systems should be able discover all (or almost 100%) of the production failures without constant assistance of the mandatory/designers. As various manufacturers from different countries will be contracted for the different board manufacturing, special care should be taken to check their quality plan and control processes. For the same reason, a homogeneous template document should be use for the calls for tender(ie. item7 in $1), it will also include the production review schedule. Using the FIDES evaluation is a useful methodology to improve reliability and must be applied to each electronic design. It has to be proven that the dissipation power of the mechanics is enough to cool the electronics in highest performance operation mode. Especially for the case of hotpoints regions in the boards hosting FPGAs. The results showed the performance of each single module in standalone mode. Some dedicated integration tests of the system assembled in the final DOM mechanical structure are recommended. Burn-in tests of several modules assembled might help to identify problems in the connections between the different modules and also with the mechanics. Although the progress on individual modules is significant, verification on system level is lacking. Omitting system level tests and verifications before doing mass production is very risk full. This concerns the following issues: o Power system; the combination of all power modules with full load, together with the long cabling has not been validated. These modules could interact when they are connected over 600m. Power-up sequence o o o o o should be demonstrated, showing that at full load there is enough power margin from central power distribution system. In such cases the connected CLB’s should always boot and work correctly. Timing system; This has been partly demonstrated in small subsystems, not taking into account all calibration parameters that are foreseen. This could have hardware implications. This system is crucial for physic event reconstruction. Data acquisition; proper validation of this should be demonstrated, that it works with stressed conditions without loss of timinginformation. (network switches, optical network, “VEOC” & min. 3-5 CLB’s) System boot and reset sequence ; Validate this on system level, does the system start properly, do all resets work as expected? Once @ 4000m depth, power switching is the only option when communication is lost (hanging cpu’s, startup interrupted, etc.). Multiboot; This is a very important and useful feature, in particular since all validations are not performed. Therefore It is recommended that this should also be demonstrated in at least a sub-system (3-5 CLB’s) in a WR-network before starting mass production. During the review was mentioned that performance of feature is not 100% guaranteed. Review: The reviewing is organized for “individual” items, e.g. DAQ, optical system, power, CLB etc. The system aspects between interacting subsystems, are in this reviewing method not properly covered. A general system architecture review, that covers specifications for complete system and the subsystems, should have been part of the review procedure. E.g. reviewing the calibration-, power- and opticalsystem after the electronics mass production has been started, does not seem to be the best approach for a “highly reliable process As a consequence of the previous comments, a recommendation is that, before the complete production for the phase 1 (31 DU), a pre- production of about 5 models (for example) would be realized and tested especially on the mains boards which are the Power board and the CLB. In the PRR introduction document showed following statement :” DU-1 • Final validation of all technical solutions will come with operation of first line”, which was later explained by M. Circella; Due to Project time constraints, the start of mass production cannot be delayed until after the final validation of DU-1. This will cause, as described before, significant risks, like for time consuming debugging during integration, delays if needed for modifications or, €’s in case of a PCB re-design. Anyway, electronic subsystems could be validated without (mechanical) integration in DU-1, and some in parallel. A general study of risks and failure points and a risk management plan in the experiment setup would be very useful to have overview on the different implications for the experiment of electronic board design and manufacturing. A general system Assembly, Integration, Test and Verification (AITV) process which includes the electronics boards will be a very helpful document for electronic pre-series boards tests.