A New Hybrid Approach for Augmented Reality Maintenance in Scientific Facilities

Maintenance in scientific facilities is a difficult issue, especially in large and hazardous facilities, due to the complexity of tasks and equipment. Augmented reality is a technology that has already shown great promise in the maintenance field. With the help of augmented reality applications, maintenance tasks can be carried out faster and more safely. The problem with current applications is that they are small-scale prototypes that do not easily scale to large facility maintenance applications. This paper presents a new hybrid approach that enables the creation of augmented reality maintenance applications for large and hazardous scientific facilities. In this paper, a new augmented reality marker and the algorithm for its recognition is proposed. The performance of the algorithm is verified in three test cases, showing promising results in two of them. Improvements in robustness in the third test case in which the camera is moving quickly or when light conditions are extreme are subject to further studies. The proposed new approach will be integrated into an existing augmented reality maintenance system.


Introduction
Maintenance can be carried out by human, semi-remote or fully-remote interventions.Human intervention in large and complex scientific facilities, such as CERN (European Organization for Nuclear Research) or GSI-FAIR (Facility for Antiproton and Ion Research), can be difficult due to the complexity of the procedures and machines to be maintained.This human intervention can be enhanced by means of visual in-place maintenance instructions and safety advices (e.g., maximum radiation dose reached, hot areas to avoid touching, etc.), provided by augmented reality (AR) technology, allowing the worker to carry out the tasks more quickly and more safely.
Hazardous facilities need to follow safety principles, such as ALARP (As Low as Reasonably Practicable).This principle requires that the radiation dose received by the workers must be as low as possible while taking into account the balance between any risks and benefits.Human interventions are planned according to the ALARP approach in order to ensure workers' safety inside these hazardous facilities.With the help of AR, the worker can complete maintenance tasks inside hazardous environments faster and avoid errors, which means that they will be exposed to radiation for minimum duration.AR is a technology that is used to merge virtual information with the real environment in real-time.Milgram defined in [1] the Reality-Virtuality Continuum (Figure 1), where AR is an intermediate state between the real environment and a completely virtual environment.This means that the user is conscious of the real environment and that this environment is enhanced by means of many multimodal means of interaction.
AR has been used in many fields, such as medicine [2], [3], [4], education [5], [6], [7] and entertainment [8], [9], [10].Maintenance is one field where AR can play an important role [11], [12].Maintenance tasks can be augmented using this technology, thus allowing the worker to carry wearable AR equipment instead of traditional heavy and difficult-to-use manuals.AR is also an aid to safety, as it can warn workers whether they are in danger or whether a procedure has been carried out in the wrong way.
However, most AR-based maintenance projects that have already been developed are only prototypes for one specific machine or device.Some of these works can be found in [13], [14], [15].Although they are very good prototypes, for their real application a more general use application needs to be developed in order to be reusable in large facilities with a large variety of machine subsystems and devices to be maintained.In fact, there have been some approaches to this generalization [16], but they displayed certain problems as to the ease of use of the proposed systems.Nevertheless, some of the state of the art AR prototypes for maintenance use have already shown good results in terms of ease of use [13], [17], task efficiency [13], [17], [18], [19] and decreasing the risk of accidents [18], proving that this technology can enhance maintenance work.
The work presented in this paper deals with the lack of general purpose applications by means of a new approach for marker definition and recognition.A new AR marker is defined, as well as an algorithm for its detection.This new approach has been integrated in an under development system that will enable the ready creation of AR applications for maintenance in large scientific facilities.
The rest of the text is organized as follows: in chapter 2, an overview of current problems and the possible solutions related to the AR field for maintenance is presented.Chapter 3 describes the proposed solution and the algorithm designed for the detection of the proposed marker.Chapter 4 details the training and testing of the OCR-A font for the proposed approach.In chapter 5, some issues about the deployment of the final solution in real facilities are presented.Finally, in chapter 6, some conclusions and future work are discussed.

Augmented reality for maintenance
AR applications gather information from the environment (such as images captured by a camera or user input) and analyse it.Once the information has been processed, an output is created and delivered to the user (e.g., an augmented image or audio output) in an attempt to achieve a natural feeling for the user, meaning that they perceive the augmentation as a part of the real environment.
The procedure followed by a typical AR maintenance application is as follows: when a predefined pattern is recognized by a camera, the AR system looks in the storage for the multimodal information associated and the marker that is augmented with it.The multimodal information represents any instructions, procedural assistance and safety advice for the maintenance, allowing the worker to carry out the work easily and avoid mistakes.Figure 2 shows an example on how AR can enhance a maintenance procedure.The image shows a green overlay that represents the augmentation area (e.g., LCD glasses or screens).This means that the worker is able to see the real scene in the background and the augmented part on top of it.As can be seen in the image, virtual models and arrows are displayed on top of the original image showing the instructions for the removal of the grey box from the machine.The information provided to the worker is not limited to 3D models, as it can also include images, text messages, videos, audio instructions and audio/video conferences with supervisors.Any kind of multimodal help could be provided to the worker when necessary in order to make their work easier.

Identified problems
AR applications use pattern recognition algorithms in order to work.Among those algorithms, marker and markerless techniques are the most common approaches.Some also consider using QR (Quick Response) codes [20] for AR applications.However, the great majority of QR code applications rely on a different concept, since when the code is recognized the application gets the encoded information -which is mainly text, like a web addressbut not 3D spatial information of the QR code with respect to the camera.This cannot be considered as AR but something different like extended reality, as the virtual content is displayed in a different context (e.g., a web browser, a video player, etc.) to the real environment.Moreover, the recognition in those applications can be done only if the QR code is seen clearly and with little distortion in the image.In order to use QR codes for genuine AR applications, a robust library should be developed.There are some research studies that use QR codes for AR applications [21], [22], but there are not available in libraries for the research community to test in order to compare the robustness of these works with the robustness that other AR libraries have shown.In fact, the work presented in [21] relies upon the camera pose estimation in the device sensors and how the user manipulates the device instead of the real tracking of the QR code, while the work presented in [22], although promising in tracking the QR code, gives a low frame rate (10fps), which means that it cannot be used for real-time applications.Moreover, most of current research trends over the last decade in AR have been focused on the marker and markerless techniques, as they have proven to be sufficiently robust and accurate for real-time tracking.
Markerless techniques have developed very quickly in recent years and they are making AR applications look more natural, since they do not need to add extra information to the environment because of the fact that they track points in real images.However, prior training of the images for detection must be performed.For small or singular projects, this approach is good enough (and even better, due to the fact that it can be more visual) as only a few images have to be trained.However, if we think of large projects where hundreds or thousands of patterns are needed simultaneously, these techniques are not advisable because of the time required for the training, the increase in the time used by the algorithm in recognition as the number of images to be tracked grows, and the high memory use.
Marker-based techniques have some advantages over markerless techniques for large facilities, as they use less memory and the tracking is faster and more robust.Marker-based applications can also be divided into two groups: marker patterns and 2D barcode markers.Marker patterns show similar problems to markerless techniques, as they require some training and the computational cost rises with the number of markers to track (although the required time and memory use are lower than in the previous technique).2D barcode-based applications do not require a training process as the system already knows every marker.Those markers have a predefined pattern made up of black and white squares inside a matrix.For this reason, the recognition is faster, more robust and requires no training and uses less memory.
As a result of these considerations, 2D barcode markers appear to be suitable techniques for large-scale projects.However, there are still restrictions with this method.For example, with a 3x3 matrix only 64 different markers are available.This number can be incremented by using larger matrices, but incrementing the size of the matrix makes the recognition less robust for small marker sizes and for long distances between the marker and the camera.

Proposed solution
In order to cope with this limitation and provide a larger number of available markers, this paper proposes a new hybrid technique where a new marker has been defined.The new marker is made up of a 2D barcode marker and is complemented by one text code above it.In previous approaches, the marker differentiation is made during the marker recognition step.In the proposed approach, the 2D barcode marker is used for detecting the position and location, while the marker differentiation is performed by the reading of the text code.
This new approach will be integrated into an underdevelopment system that will enable large scientific facilities to easily design and develop augmented applications including multimodal interaction for the maintenance of scientific instruments as well as other, related, devices and machines.
AR maintenance researchers have shown great interest in multimodal interaction, such as speech recognition [23], gaze interaction [24], opportunistic controls [25] or combinations of various modalities of multimodal interaction [26].The AR-based maintenance system that will integrate the new approach presented in this paper implements several multimodal interaction interfaces ranging from the most traditional ones (keyboard and mouse for input and augmented video as output) to more novel interfaces, such as gesture and marker interaction.The purpose of the final system is to be as general as possible, allowing for the easy application design for large scientific facilities and a natural Human-Computer Interaction (HCI) for the maintenance worker.

System overview
This paper proposes a hybrid approach for marker detection that will be used in an AR-based maintenance system for large scientific facilities.The proposed system works as follows: it gets the image from the camera and detects the marker using a hybrid approach for the marker.Once the marker has been recognized, the system queries in a database for the specific information associated with that marker and augments the final output according to it.The content information for the augmentation is usually hardcoded in most prototypes due to the low number of elements to display.However, this is not possible for large facilities being thus crucial the use of a database that manages all the elements and information required for this kind of system.The purpose of this paper is to describe a new hybrid approach for marker detection and, therefore, the rest of the paper will explain only the details from the marker recognition subsystem.

Hybrid approach overview
Figure 3 shows the proposed approach for new marker definition and the technique for its recognition.The image containing the marker is fed into the system.This image contains the hybrid marker in any position and orientation with respect to the camera.The 2D barcode marker is detected first.With this information, the image is manipulated with computer vision techniques in order to achieve a readable screen-aligned text and, afterwards, the text code is segmented and read using optical character recognition (OCR) techniques.The following section explains this procedure in more detail.

Algorithm description
The current state of the art recognition libraries for 2D barcode markers are very robust and work smoothly on current computers.For this reason, this work utilizes an already available library.
The algorithm first uses ARToolKit [27] for 2D barcode detection and tracking.ARToolKit is a well-known marker recognition library and one of the most commonly used libraries in AR applications due to its accuracy and robustness.It provides information about the marker, such as the screen position of the corners of the marker in the image and the orientation of the marker, which will be useful in the next step.
The goal after detecting the marker is to read the text code above it, as trying to read the whole image can be time consuming.The marker is seen in the image in a distorted way.The recognition of the marker is very robust even if the marker is not seen as a perfect square, but the text recognition is very weak if the text is not screen-aligned.In order to have a screen-aligned text, it is necessary to rectify the image.As the marker is a planar surface, a suitable way of reconstructing the image is to compute a homography [28] between the image captured by the camera and a new, rectified image.The calculation of a homography enables the calculation of the correspondence between points in two images (or planes in general).This means that it is possible to calculate the correspondence between points in the non-screen-aligned marker and those in a screen-aligned square.The homography between two planes is defined by a non-singular 3x3 matrix.The matrix calculation is made using the correspondence between the known coordinates of points seen in the original and rectified images.The homography matrix needs eight equations to be calculated, which means that the minimum number of points needed is four (as each correspondence between points involves x and y coordinates).
Once the marker has been detected, the homography is computed between the four corners of the marker and a virtual square.With the orientation of the marker, it is possible to know the order of the corners of the marker and, thus, form the correspondences with the virtual square.The homography matrix obtained is used to create the rectified image.In Figure 4, both imagesoriginal and rectified -are shown.As can be seen in the image, the four corners of the original image have their corresponding corners in the rectified image.
The text in the rectified image is suitable for the OCR process.As has been mentioned before, trying to detect text in the whole image can be a highly time consuming process.For this reason, the area above the marker is segmented and scaled into a new image, as can be seen in Figure 5.The new image is ready for OCR but the image can be blurred, for several reasons (e.g., the image has been taken in motion, the marker is very small or it is far from the camera, etc.).In order to achieve better recognition, the image is sharpened using the unsharp masking (USM) technique before the actual text code recognition.The USM technique consists of creating a blurred image that is subtracted from the original image, thus creating an unsharp mask (a threshold is used to decide which pixels will define the mask).Later, this mask is used in combination to a high contrast version of the original image and the original image itself to create the final sharpened output image.Figure 6 shows two examples of the original images (left) and the images obtained after the unsharp masking (right).As can be seen in the images, the UM method makes a slightly sharper image.At this stage of the process, the image is fed to the OCR engine.Although the image is very small, the OCR process is still time consuming, and so the homography computation, image restoration and OCR reading processes are carried out in a different thread.Figure 7 shows the two main threads inside the system.The main thread is in charge of the image acquisition, 2D barcode marker detection, multimodal augmentation and final output, while the OCR thread gets the raw image from the main thread as well as the information following the 2D barcode marker detection.The output from the OCR thread is fed to the main thread in order to get the correct multimodal information for the augmentation.The OCR thread enables the continuous checking of the text code above the marker, while the main thread works in realtime without freezing, allowing the system to properly identify the real text just in case it is missing in some frames due to blurred or defocused images.

OCR-A font training and testing
OCR technology has certain problems with regard to how characters can be mismatched.For example, number '0' and the capital letter 'O' can be easily mixed.For the purpose of robust character recognition, a specific OCR font, called 'OCR-A' [29], is used in this study as an alternative for traditional fonts.OCR-A is a text font that is meant to be easy to recognize by computers and humans.This font is standardized as ISO 1073-1:1976 [30].The OCR-A font is monospaced (it has a fixed-width), which makes it easier not only recognition but also for the creation process, as the marker also has a fixed-size.The OCR system used in this work was originally trained to recognize sans-serif fonts by default.For that reason, in this project the system has been trained to recognize the OCR-A font.After the training, some tests were made to compare the results in character recognition.To do so, OCR-A has been tested against a sans-serif font (Arial).
The tests consist of a series of recognition loops.Every loop consists of 200 readings of the text above one marker.The texts are eight characters long and contain characters that are likely to be mismatched.An example of text could be "0ODB8T71".The recognized text is compared against the real text that is introduced into the system by the user through the keyboard.A success means that every character has been properly recognized.If one or more characters are not recognized or mismatched, the reading is considered to be a failure.After the 200 readings, the percentage of success is calculated.This process has been repeated six times for every case study.In order to get more realistic results, the tests were carried out with a conventional webcam in autofocus mode in an attempt to recreate an environment close to the final, real use.
Three case studies at two different resolutions were carried out.The selected resolutions were 800x600 pixels and 640x480 pixels, which are very common values for most webcams.Resolutions under those values have proven to have very poor performance for the aims of the application.The three case studies are as follows: • A static camera, consisting of the recognition of text above one marker with both the marker and the camera in a static position.This is the most stable case, as the images are sharp (unless the autofocus fails) and, thus, the recognition is more robust.• A handheld camera is one typical use of the application where the marker is fixed over a table (or attached to a static surface) and where the user holds the camera trying to avoid strong movement.In this case, some images may be blurred due to the user's pulse or some failure in the autofocus.• A moving camera gives the worst test results; here, the camera is randomly and continuously moved and zoomed by hand so that most frames are blurred or even defocused.This should not be a desirable use of the final application, as these movements make the viewing experience of the content barely satisfactory for the user.
Figure 8 shows the results of these tests.For every case of the study, the mean success rate of the six repetitions (200 readings each repetition) has been calculated in percentage terms.The standard deviation is also displayed in the figure .As seen in the image, OCR-A gives better results than Arial in all of the case studies.The success rate is higher in all cases when using the OCR-A font, being notably high when using a static camera.The results from the moving camera (strong movements) are, as expected, not very robust, as the movements make the images blurred or defocused.However, the results are still much better in the case of the OCR-A font.Although these results may not seem good in the case of the moving camera, it is important to point out that the algorithm is continuously reading the text code, which means that it will work even if some frames are blurred or defocused due to the fast camera movement.
The high values of standard deviation are a consequence of the mistakes in recognition.If the text contains characters that are easy to mismatch, the recognition will fail in most readings, throwing up a low success rate.On the other hand, texts that are more robust in relation to OCR recognition will provide better results.In the best cases, when not using mismatching characters, Arial could achieve results almost as good as OCR-A, but for general use (this means using only alphanumeric characters), Arial will fail more frequently than OCR-A.
It is important to point out that the failures produced by the OCR-A font are, in most cases, due to the missing of a character because of a blurred image or a similar problem rather than a mismatch between characters.However, Arial failures are more closely related to mismatched characters, such as recognizing '0' as 'O', 'B' as '8' or '6' as 'G', etc.
Special character recognition -meaning nonalphanumeric character recognition -is also an important issue, as it increases the number of available codes.After the OCR-A testing process explained above, an additional effort has been made to allow for the highest number of available characters for the final system.In order to increase that number, a second testing process has been carried out.All characters (alphanumeric and special characters) in the OCR-A character set have been trained, although not all of them will be used.In the preliminary tests of this second process, twenty-two of those special characters have shown robust results in the recognition process.Eight special characters are usually recognized, but the recognition is not robust enough to be included in the system.Finally, eighteen special characters are not well recognized or even recognized at all, so those characters will not be included in the system either.
After the selection of the twenty-two special characters that work robustly, a new test including only those special characters in the text code above the marker was carried out.Three different texts were tested.For every text, ten loops of 200 readings per loop have been performed.The texts used and the results from this test are shown in Figure 9, below.As can be seen, the results are not as good as expected.The reason for these results is that some characters are sometimes mismatched.The first text contains two characters that are sometimes mismatched with another two characters.The second text, which gives better results, contains only one character that is mismatched with another character.None of the characters of the third text is mismatched, and thus the success rate is over 90%.As can be guessed by the standard deviations of the two first texts, the problematic characters are properly recognized many times, but the high probability of a mismatch advises against the use of these problematic characters.For this reason, the three problematic characters were removed for the last test.It has been noticed that there is one character which, although most of the time it is wellrecognized, may cause some mismatching problems and, for that reason, it has also been removed.In the last test, three new texts including only the eighteen selected characters were read.In this test, three loops of 200 readings per loop were carried out for every text in three different cases (static camera, handheld camera and moving camera, all cases with a resolution of 800x600).The results of this test are displayed in Figure 9, below.
As can be seen, the results are very similar to those obtained in Figure 8, so ultimately those eighteen characters will be used together with the alphanumeric characters.

Deployment issues
The work presented in this paper proposes a new method that will be integrated into a system under development that enables the creation and execution of AR-based applications for maintenance in large scientific facilities.For this reason, some issues have to be considered in the deployment of the final solution.This chapter exposes some of these considerations.

Real facilities' conditions
One of the most important issues to deal with in these kinds of systems is to test it against real conditions.For this reason, some preliminary tests were performed in a real facility.The chosen facility was a 40 m long and height and width real scale LHC (Large Hadron Collider) mock-up at CERN.
The tests were performed using different markers over a real collimator.Those tests have shown that the marker can be properly recognized in real cases by the proposed algorithm, even under different light conditions (natural, halogen and actual facility light conditions have been tested).
Figure 10 provides an example of those tests (from top to bottom, the original frame, the rectified image and the segmented text).In this particular case, the marker used is white instead of black, as with those presented in previous images.Both black and white markers were tested over the collimator, showing similar results in terms of marker detection and text recognition.Another aspect that has to be taken into consideration is radiation.During the tests, paper markers were used.In many cases, the final system will need permanent markers over the devices.As some areas that have to be maintained may have high radiation levels, proper material for the markers has to be used.If a radiationresistant material is not used, the radiation may modify the appearance of the marker or even destroy it.For this reason, markers made up of anodized aluminium will be used in those cases, following a similar approach to that presented in [31], where photogrammetric targets made from anodized aluminium are used for measuring the position of the collimator.

Training for workers
AR is an intuitive technology with a quick learning curve such that even users who have never used an AR application before have been seen to use them with good results [32].For this reason, the final system that includes the approach presented in this paper may be used even without previous training.However, it is always advisable to provide workers with a short training period explaining the most basic features of the system and the concept of AR itself.As this training will be minimal in most cases, workers do not need to get every detail of every machine or device to be maintained, as this information can be provided by the AR application itself.For this reason, the costs involved in the training process can be reduced.

Conclusions
This paper proposes a new hybrid approach for AR applications oriented towards applications that are used in large scientific maintenance projects with hundreds or thousands of different marker patterns to be detected.The marker used in the proposed system is a new concept of a marker composed of traditional 2D barcode markers and a text code string.
Pattern recognition and computer vision algorithms have been used to detect the marker, segment the text code and read its characters.The final marker is then augmented according to the information contained in a database.
The pattern recognition algorithms are robust for marker scaling and rotations.However, the OCR method is sensitive to large image rotations.For this reason, the original image needs to be rectified so that the text to be read is in the right orientation.To that purpose, a homography between the four corners of the marker and four virtual points is computed.With the help of the homography matrix, the original image can be rectified independently of the marker rotation into a properly aligned image where the text string is displayed horizontally.
In order to verify the performance and quality of the proposed approach containing marker recognition and OCR, some tests were carried out.These tests include not only the performance of the system itself but also a comparison between a sans-serif font and an OCR-A font.
From the results, it can be concluded that the OCR-A font gives better performance in terms of success rate for the different tests and, thus, the OCR-A font was selected as the most suitable font for the final system.
This paper has also presented certain issues to take into account when the final application is deployed.Those issues consider facilities' conditions and training for workers.

Future work
Most AR applications ground the perception of the real environment in computer vision techniques.The proposed work follows the same approach.The work presented in [33] suggests that in future, indoor AR applications will integrate computer vision systems with external fixed tracking schemes.It is expected that more sensing technologies (such as RFID) will be tested and integrated in the final system as complementary environmental sensors if they prove advantages in terms of more robust and accurate tracking systems.
Another important aspect is to improve the robustness of the marker recognition.For this purpose, new ideas will be explored, such as using infrared tracking methods, like in [34], together with reflective markers, as proposed in [35].
The study of how workers react to the final application will also be performed, probably at GSI facilities.It is important to know whether the information provided is sufficient and whether it is distracting workers from actual maintenance tasks.
An important issue in state of the art AR maintenance applications is that they are usually hardcoded applications that cannot be used by non-experts [36].Although the proposed system is still lacking an authoring tool, some steps to solve this problem have been already taken (such as using databases) and it has been planned that an easy-to-use authoring tool will be developed in order to allow non-programmers to create and edit the final applications.

Acknowledgments
This research project has been supported by a Marie Curie Actions Initial Training Network Fellowship of the

Figure 2 .
Figure 2.An example of an AR application for maintenance.

Figure 3 .
Figure 3. Overview of the new marker approach and algorithm.

Figure 4 .
Figure 4. Original (up) and rectified (down) images.The red lines show the correspondence between corners.

Figure 7 .
Figure 7. Main threads and their interaction.The main thread is in charge of the main flow, from the image acquisition to the final output, while the OCR thread computes the computer vision step used to recognize the text.

Figure 8 .
Figure 8.Comparison of the results of the tests carried out for OCR-A and Arial.The OCR-A fonts give better results in all cases.Static camera tests showed the best results.Handheld camera tests also showed good results, as the text is recognized in the majority of cases.The worst results come (as expected) from the moving camera tests, as many images are blurred or defocused.

Figure 9 .
Figure 9. Results of the special character recognition tests.The results from the first test (twenty-two special characters) show that when some characters are present in the text, the success rate decreases for the same test conditions.The second test (eighteen special characters) shows that after the removal of four problematic characters, the results are very similar to those presented in Figure8, when only alphanumeric characters where used.

Figure 10 .
Figure 10.One example of the tests performed at CERN.The images show (from top to bottom) the original frame, the rectified image and the segmented text.