Computer Graphics 1F10170044 – DONOV Veselin Programming Tasks Report Task-1: Implement the Bresenham’s line algorithm Additional info: User enters coordinates of start and end point of line and width. Programming language used: C ++ File: task-1.cpp Code flow: (not in order of lines but in order of execution - process) sign(x) – will be used later on to determine the sign of the difference between start and end point coord and in terms the direction of the line x_s, y_s – essentially x1 and y2 defined globally that the user will input in the console (s stands for start) x_e, y_e – essentially x2 and y2 defined globally that the user will input in the console (e stands for end) w – width of line defined globally that the user will input in the console Main() – This is the main function executed when the program is run Row 88 – 91: User is asked to input the parameters for the line to be drawn Row 93 – 94: Drawing library is initialized, and screen is set: Rows 76 - 82: Window is setup, size is 1500x1500, black background, anything drawn would be white and we are setting up the screen and drawing to happen in the 1st quadrant Row 95: We call the function that would draw the line eventually The function takes the inputted arguments by the user and sends them over to the Bresenham’s line algorithm for it to calculate the pixels. The general function for calculating the closest pixel representation of the line to be drawn. Row 49: The function that ‘paints’ the pixel glPointSIze: Defines the size of the individual pixel which is the line width value the user has input at the beginning of the program Once the program is compiled and run the following appears in the console appears in the console and prompts the user to enter the data in the following format. Example values are given in case user is wondering what to write. Once params are input the draw window appears showing the image Task-2: Implement a simple paint tool Programming language: JavaScript File: task-2.html Code flow: (not in order of lines but in order of execution - process) First, I create a very simple HTML form for the user to interface with the drawing board, allowing them to choose from 4 colors and 3 thicknesses, as well as a clear button. This is by far not the best way to do this in HTML and JS, but for the purpose of the task it serves as a supporting element. Next is the DIV element, that I will transform into a canvas in JS (ensures browser support) and the jQuery code library to simplify event handling and elements selection. Here I transform the div element into a canvas (HTML element that can be used for drawing, etc.) and add its necessary attributes and get its context so I can inject variables and elements inside later. Simple clear function that resets the canvas to its initial empty state and an event handler for clicking the clear button. Next, are the in-canvas mouse events that determine if the user is drawing or not. First, if the user’s mouse button is down we get the relative coordinates and initiate the painting process. addClick() and redraw() will be explained later. Next, if the user is moving the mouse while in the canvas, and painting is on (meaning he is holding the mouse down), we draw. On the mouse button being lifted up, or on the mouse leaving the canvas, we stop drawing. Next, we create the arrays that will hold the drawn strokes’ information, because since we are using a very simple drawing solution, each time the user draws a new line, canvas will be cleared and all previous strokes will be redrawn. Then, the addClick function pushes all the new data to the arrays and also serves as an automatic detector of what color and thickness the user specified. The dragging property is to know if it was a simple click by the user, or if a line is drawn (refer to $(‘#canvas’).mousemove for ref.) Last is the redraw function. It clears the previously drawn strokes and redraws them along with the new one. It uses the built-in canvas functions for path drawing, color, thickness, etc. and a for loop to cycle through the stored data. Task-x: INIAD Digital Signage Recognition Software (from Report 2) Programming languages: Java, Python Directory: task-x For simplicity, only main code files are included, to explain the working of the code, not all libraries and necessary components for actually building and running it. For seeing it work, refer to the video links added to Report 2. A lot of the things made while coding this were probably not the perfect way to be done, but I plan to work on the algorithm this summer vacation and apply some useful knowledge I gained from Computer Graphics class now that I better understand the whole process behind the functions I am using. Since this is quite a big task, but also not part of the material in class, I leave it up to you whether to grade it or not and for how many tasks you would count it. Regardless, I’d like to hear what you think about it. 1. Java Mobile Application In Report 2 I explained the overall process the algorithm uses and how it works. Here I will dig deeper into that by talking a bit more about the code itself. I have never written Java before creating this application and code, so there might be better ways for coding it in real life. I am submitting the whole work file with some commented-out bits that were for testing, etc. - I will skip the first ~65 rows of library imports as you can simply see them in the code and they are automatically added to the project by Android Studio as I use them in the code. Java is a strange language in my opinion, and far different from the languages I am used to and am comfortable writing in. One of the most frustrating things for me trying it for the first time was learning how to properly declare variables and variable types (especially hard with the math ones) for me, coming from languages like JavaScript, where a variable is a variable, nothing more. Figuring out how to fix variables took a lot of gambling but I eventually made it work. Not saying this is the correct way, but it works, which is all I needed in my case. 1.The app uses the camera and acceleratometer/gyroscope so I have to init them properly. For some reason the built-in OpenCV camera function made the image flipped and to avoid having to do extra manipulations to each frame so that the user sees a proper camera view, I found a workaround by injecting the default Java CameraView instead. Next is the required onCreate function where I essentially setup the things I will be using below, like the Camera View, as well as ivImageProcessed which is one of the test image views where I will display the processed image before the final crop and sending to server. It is visible on some of the videos. IvImageCropped is the second image view displayed in the other corner and showing the final cropped and warped label. NumberTextView Is exactly what It says. It will display the result room from the server.Last, but not least Is the Gyro, used for determining the pointing direction of the camera. If no gyro is available on the device, it will just try to process images all the time. If a gyro is present, the coordinates are taken and will later be used. As I said Java is a weird language and requires all those functions that are pretty generic and selfexplanatory, also part of every OpenCV Java tutorial in existence, or any Java app using a camera. It ensures the app stops using necessary resources upon switching apps, turning off, etc. and re-enables them upon going back to the app. Finally it is time to create the image view output. In the case it is an 8-bit 4-channel OpenCV matrix (CV_8UC4) so It can be a fullcolor frame for the user to see like a regular camera. Similar to QR code scanners, I draw a square frame in the middle of the view for the user to know and position the signage within it so checking and processing is done only to that small area, instead of the full view. In the early versions I was processing a frame upon the user clicking on the screen, hence the residual code there on rows 176-181. It was later abandoned for using a more automated method after I made sure it works. Next we have one of the key functions onCameraFrame, which Is basically run every 24 times per second (24fps). However, processing every frame would put a too huge of a strain on the device, so I do a little hack (there are most likely better ways to do this, but I decided to cheat hack it the easiest way possible). I count the camera frames and only use every 6th frame, meaning the code proceeds 4 times each second. I am sure this can now be decreased further to for example 2 times each second (every 12th frame), but 6th frame worked fine so I decided to keep it. ‘Running’ is another check I do to avoid buffering and potential crashes. Since the following code takes time and also waits for a response from a server, once the code proceeds this running variable tells the code that right now a frame is already being processed so no need to send a new one, until a result Is received. If the frame is the 6th frame and no previous frame is being processed, we reset the frame count, let the code know we will be running a processing and proceed to isOriented, which does the device orientation check. After a lot of tests and pointing the phone at walls, I managed to get the right values for a proper orientation check in order to know If the user Is pointing the camera at a wall with a potential signage on it, not the floor or ceiling. If those requirements are met, the candidate frame proceeds to the first frame processing function. If not, we let the code know running Is finished and we are free to try another new frame. Here we create the two frames I mentioned before (ivImageProcessed & IvImageCropped, which I don’t display any more). These are the graySnap and bwSnap. Because the process used for the edge detection, contours, etc. would alter the frame too much for the processing algorithms of the actual signage, I create two clones respectively. First, however I crop and only keep the Image in the centered square by using submat. Then I convert the second mat that will be used in the final processing from RGB to Grayscale and apply binary thresholding turning the potential white-ish numbers and texts on signages to 255 value and all the darker colors of background (green, red, blue, etc.) essentially anything with value below 160 (tests showed this is the best value under various lighting conditions) becomes 0 (black). The original samples of labels are stored on the server DB in the same way. The second mat with unprocessed crop is sent on to the next processing functions. As the function name states, the purpose is ‘customContrast’ where I do a series of image processing to ensure proper edge detection and for that contrast is needed. But simply adjusting that is not enough, there are more steps to be taken to emphasize the edges. I am using a custom HSV processing function to get the average histogram values of the image for the channels and use that as a threshold value. The getHistSverage function will be shown below. After applying the threshold with the custom value, I do a dilation which serves as filling the uncertain gaps left by the thresholding and erosion to eliminate edge fuzziness. Another threshold is applied, but unlike the previous one which was binary inverted (243 for proper edge fix) now it is standard binary. Images of how all that looks can be found in Report 2. Then, finally I apply a Canny filter to detect all those edges. Before coming up with this solution, if you see the code file, I was experimenting with LAB and individual channel manipulation to try and improve the image. After all this, the main big processing function comes into play. The getHistAverage function is something I came across online and really helped me. It is not something so special, it utilizes the Imgproc library and the rest is math. The comments pretty much explain it. Since the next function is huge, I will split it into smaller pieces for better understanding. Everything between those - - - - - - - lines will be the protected void processing() function in rows 325 - 411 ---------------------------------------------------------------------------------Here is where I really began hating Java with all its different data types. First I use the canny filtered image to find all the contours and their hierarchy using the simple approximation method as we are looking for the simplest shapes and basically straight lines, no detail is needed. The purpose of all this is to get the outer edges of the signage which we can then use to cut it out. Next begins an agonizing looping of the collected contours to find the one matching our requirements which are: - The contour with largest area (signage on a wall is the biggest thing in the frame) - The shape resembles a 4-sided 2D polygon - The shape is close enough to having 90deg corners (if the signage is at too big an angle from the camera, processing will become too difficult and inaccurate) All those checks are done by the following: - Row 340: Imgproc.approxPolyDP with the specified attributes tries to connect the closest edges together to form a polygon of some kind, storing the number of curves it has (polygon sides). - Row 342: a check of sides is done and if the sides are indeed 4, and the polygon has the largest area so far (each one is checked and the largest one gets stored), we proceed to test the rest - Row 347: I calculate the absolute value of the cosine between each two edges in order to ensure proper perspective and warping can be done. I only need to check the biggest difference of cosine in the figure to be sure so the maximum one is selected for the check. Naturally a cosine of 90deg is 0. Through testing I determined that the optimal angles of the shape that would still allow for 97% accuracy was with angles between 72deg and 108deg, which is a cosine of 0.3 or - 0.3, so an absolute value covers both cases. - If the polygon has 4 sides, it has the biggest area and all edges are 72deg < X < 108deg, we are good to proceed onward. If we have a winning candidate that passed the test before, it’s id is recorded and we do a check if we have one. If we do not have one, the code program is told that running has stopped and we can try again. If, however we have the id of a winning candidate, we then need to make sure the contour size is enough for us to properly get accurate results, because if the signage was snapped at too far a distance, the polygon would be the biggest, yes, but not big enough. Through testing I determined that a contour size of at least 180 is needed. Again, if contour size is smaller than 180, program is told running has stopped and a new frame can be captured. If, however the contour size is big enough, it is time to get the coordinates of the 4 edges of the polygon as we will need them later. Another approx PolyDP is applied with more aggressive settings this time to make sure of accurate results. Then we put all those gathered 4 points’ coordinates in an array list. It has to be converted from one data type to another (this took me a few hours to fix), because Java… Then we warp and do perspective fixes on the original bwSnap we did in the beginning with a custom function that uses the newly saved points’ coordinates. I will talk about it after this. Once warped, we resize it to 200x200px which as explained in the report was found to be an optimal size for storing sample signages. One final binary thresholding to eliminate any possible grey values occurring during resize or warp. The warp function takes the image to be warped and its corners’ coordinates. A new output matrix is created with the target size, and again, because Java for the data types to be compatible and successful in results, we have to make a new empty array list of 4 corners which are then converted to a matrix. Finally, a perspective transformation matrix is created by using the startM and endM matrixes we just created. This way Imgproc calculates the needed transformations for making each point of startM end up at the position of endM. Then the actual warp & perspective is done to the target image using the perspective transform matrix created and keeping the size. The result is sent back. Back in the main processing function after the warp has returned the image of the target signage and it has been converted to a matrix of binary values, it is time to send it all to the server. To do that, the matrix which is 2D is turned into a 1D list to be sent to the server after being thingified by executing a HTTP Post request. ---------------------------------------------------------------------------------This HTTP Post request function is probably the ugliest piece of code I have ever written, but I needed it to work as it was the final piece of the puzzle. You can take a look at it in the code, I will not look at it in much detail here as it is not so related to the topic. Initially the server would run locally on my computer but later it was actually put on one of INIAD’s Google Cloud Servers. Once the server returns a response, if we have actual room results it is displayed as seen on row 504. Either way, running finally stops here and the program can repeat itself. There are a few other helper functions and things used in the code, also old functions I no longer use, but are kept there just in case and as a reminder. 2. Python Django Server OpenCV for Python is way more pleasant to use, plus we studied Django in first year so it was good practice. The server uses a SQLite3 Database I will explain in a bit. I am submitting 2 files from the server. One is the main API processing function that handles the request from the mobile app and responds. The other one is the management one. It was used to batch add the whole set of label samples I batch processed with photoshop. It is also used to check the contents in the database and manually add new labels as they sometimes change or new ones are added. A lot of the code is commented, but I will go through it as well. The labels data in the database is split in 2. Processor_smalllabels and processor_biglabels. The whol reason for this is described in more detail in Report 2. Several columns of information are stored for each entry such as EN name, JP name, room number and a count of ‘whites’ – essentially the 1-s, again used for filtering when the target label’s number of whites is too close to the border between small and big. main-processing.py this function essentially uses a csv file with the signage’s labels and information, and a directory with preprocessed black and white 200x200px labels taken from the original design team who made them. Similarly to the Java application, binary thresholding is applied and then the data is joined into a string to be put in the database along with the information. Depending on the number of whites the labels are classified in two – big and small (amount of white) and so they are added to the corresponding table. The rest of the commented code were all previous ways I had for adding labels that didn’t prove to be as effective as the finalized one. The add function along with several html inputs allows for manual upload of labels to the database by inputting the necessary information and uploading the image. Again, binary thresholding and counting the white pixels to determine which table to save into. api-processing.py This is the function executed when the server receives the request from the mobile application. @csrf_exempt allows for the request to be handled without all the necessary tokens and headers. In the case no sensitive secure information is sent so it is ok. The string data is then reshaped into the 200x200 matrix and number of 1 pixel values is counted. As explained in Report 2 if whites are below 6200 it is considered a ‘small’, and if over 7600, ‘big’. The values in between are considered unclear so a mixed search is done. The “regular” property in the function calls means the orientation of the label is regular. I will explain why later. SearchSmall and searchBig are identical with the only major difference being the table they look into. At this stage it is just copy pastes of each other with slight differences so I will only talk about one. searchMixed is a combination of both. This is the regular case for the searchSmall function. Each table entry string is reshaped into the 200x200 matrix. Then a binary XOR is applied by subtracting the target and sample signage. Visual representation is given in Report 2. Finally, the non-zeros of the resulting matrix are counted (non-zeros means there was a difference between the input matrixes). If that value is smaller than the previous sign tested, it is stored as the potential candidate. After all sample labels have been checked and compared, the one with smallest difference is stored. If this difference is less than 800, this is the winner and its data is sent back to the Android Java app to be displayed to the user. Something not mentioned in Report 2 and a new feature I coded after writing the report recently is seen on rows 53-55. If you look at some of the videos from Report 2, many times, depending on the angle at which the camera points at the signage, it sometimes rotates, because the approxPolyDP and 4-corners detection done in the Java app start from the top moving down, so the corner with lowest Y-axis value is considered the start. In some angles that ends up being the top right corner of the signage, instead of the top left. This causes the label to rotate itself 90 degrees, which makes matching it to anything impossible. So a quick solution I added, instead of making the Java application even heavier by detecting which edge is first and rotating there, I instead use the server’s way more superior computing power to rotate the target signage and try to match it again, if the first attempt was unsuccessful. This is why the 3 functions (searchSmall, searchBig and searchMixed) are recursive, and this is what the “regular” and “rotated” parameters are for. So after the image failing to match with the database, it is rotated in row 54 and searchSmall calls itself with the new data, but telling itself to do the “rotated” case this time. Rotated case is almost the same, except if it fails this time, it fails. A lot of the code can be shortened, simplified and improved, but at this stage I am still figuring things out and adding specific cases for specific problems that arise so it is easier to have the code copied and in similar chunks to identify where the differences would be, before finalizing it and working on performance. Plus, at this moment I no longer own an Android device to physically test on, so data I am using is dummy.
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )