Final Project (IDEAS + Workflow)

IDEAS: Deciding on an idea for my final project was such a hard thing to do. Mainly because there are so many things that I am always curious about and would like to make. So, after many ideas and sketches, I decided that I wanted to make something to solve a problem I face in my daily life, while being of interest, of course. So, I made a chart of what my intersts are, and another chart of possible problems/ inconveniences I usually encounter within those certains interests.

As a good starting point, I drew very rough sketches of "potential ways" I could solve these problems

I love learning languages! I speak four languages in total, and Japanese is one of them. Japanese is a very unique language compared to other langugaes in that it uses three sets of alphabets, or "writing systems" depending on the origin of the word. The three sets are: Hiragana (for native Japanese words), Katakana (for words imported from other languages), and Kanji --Chinese characters used in combination with Hiragana to give insight to word meaning. Any student of the Japanese language won't have problems memorizing Hiragana and Katakana as both are easy to read, are 46 characters each, and don't change in pronunciation according to the word they appear in. Kanji, on the other hand is 2,000 + characters that change in pronunciation according to word. Because I love reading Japanese novels, it is always daunting for me to go and look up the pronunciation or meaning of a certain kanji character or combination when I come across one I don't know. On top of that, keyboards that support kanji characters for typing only work when the pronunciation of the kanji is known-- which is not ideal for my case.

The only two availiable "time-consuming" technologies are either using the google translate application to take a picture of the text,and highlight the kanji you want to read. But this method doesn't give the reading, and instead simply gives the English translation. The other alternative is to use the iphone's chinese keyboard to write (with your finger) the kanji character, then look it up online or in a Japanese dictionary. However, sometimes the kanji writing could have multiple strokes and be overly complicated, that a user, like me, won't be successful in inputing.

So, long story short, to solve this pressing problem, my idea for my final project is: a scanning pen for Kanji characters! .

The pen works by having a scanner at its tip, where you scan the word. Then, through OCR (Optical Character Recognition), the image is transformed to text, and matched to the correct reading of the word in Hiragana.

To model this final project, I made a prelimeinary iteration using Solidworks of how the pen would look like. I like using solidworks because as a mechanical engineer, it is a platform that I am used to. It's inteface can be a bit confusing at first, but watching a couple of tutorials can help, and is definitely worth it! :)

Coding: OCR and Databases

OCRS: By searching for some open source Optical Character Recognition platforms, I came across "Tesserect". I am currently fiddling with it to see how well it can detect Japanese text.

Tesserect: To get Tesseract working on my computer, I followed the instructions mentiond on its Github, which can be found here. Because I am using a mac, I did all the installation through Homebrew. It is very straightforward, so just follow the steps outlined on the Github page.

I also downloaded the "tessdata" for Japanese and vertical Japanese (because vertical Japanese is what novels use), and added them to the tessdata folder within the main tesseract folder.Since I needed to do a quick prototype to check how good Tesseract is at detecting characters, mainly Japanese Kanji characters, I used the example xcode that comes with the mac version of Tesseract. I first tested on English, and noticed that it was accurate for translating computer screenshot text images into actual text. However, I noticed that it wasn't very accurate when it came to handwritten text, which I tested by taking screenshots of my own handwriting. Useful Note, to know where the tessdata is added or tesseract is type "brew list tesseract" to see that.

Inaccuracy in character detection when text is written by hand. "Ground" is read as "Around".

Because I will hopefully be using the pen for printed material (novels), rather than hand-written material, the inaccuracy mentioned above should not pose as a hurdle.

To test for Horizontal Japanese writing, I took an image with my phone of Japanese text, sent it to my laptop, then took a screenshot of it through the tesseract example code I changed from the deafult English text detection to Japanese detection. Note that to do that, you would have to go the file SLViewController.m and within the function ImageToText change the ocr.language from "@eng" to "@jap" for horizontal Japanese text.

As I have mentioned in the previous (very long ago) section of this page, Google has a wonderful extension that basically produces the output that I would like my scanning pen to do (only limitation is that it does it for pure text, and not for images). That is, provide the reading of the Chinese (Kanji) chracters in the form of native Japanese characters (Hiragana). This google extension is called Rikai-kun. The picture below illustrates what I mean. Highlighted text (as wel as one bordered in red) is written in Kanji. Text in blue bubble and bordered in green is the Hiragana reading of the Kanji.

Rikaikun Google Extension: After some searching online, I found that Rikaikun is actually based on a free open source Japanese Dictionary called JMdict/EDICT

Setup: I am thinking of using a RPI zero, a microcontrorller that I build by myself, a camera, and an LED screen. The LED screen would connect to the microcontroller, and the microcontroller would talk to the RPI. The database would be in the RPI. My idea is that when a picture is taken, it is sent to the RPI, which in turn uses OCR to produce text and translate it. It then would communicate with the microcontroller and send it this text, which the microcontroller would then display through the LED screen. I am thinking of something along the lines of this tutorial, except that I don't plan on building a C# interface with visual studio.

Proposed Workflow.

On the fab website, the suggested LCD is a Hitachi, and supports KANA characters, which is good for the purposes of this project.

LCD Supports Kana

To run Tesseract through the Command line in Terminal. Basically use the following commands. (This is assuming that Tesseract is installed). Note that the default detected language is English. Also note that the deafult output for Tesseract, if directory is not specified, is the deafult home [Can be acessed on a mac if you click on Finder==> Go==> Home].

>>tesseract file_location/file_name.extension output_location/output_name >> cd/output_location >> cat output_name.txt

Note that the deafult format for output with tesseract is "txt". Also note that if your directiories or file names have "spaces" between them, use "" to encapsulate them.

Python & Tesseract: I researched which coding platform would work best with RPI and Tesseract for OCR purposes, and found a good number of tutorials/pages on how to combine these three elements together. To use Python with Tesseract, we first must install a couple of dependencies. Run the following code in your terminal window:

>> sudo pip install pillow >> sudo pip install pytesseract

If you do not have a python installed, and are using a mac, then run the command "brew install python". That is also assuming that you have homebrew installed, which if you don't==> run the command (/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)")

I created a simple python code as a draft. The python code currently works for English, but pytesseract hasn't been set for Japanese yet, which is why if I insert an image in Japanese text to it, I will receive an error message. If you want to create a python file from terminal==> "touch file_name.py". This creates a file for you on your user Desktop

To run python code from command line==> python ./code_name.py

TOUBLESHOOTING AND BUGS: So, To actually install tesseract and have the languages all installed through terminal correctly, I did the following. I Install Tesseract like what I outlined earlier, and then typed the following in terminal:

>> brew list tesseract

This allowed me to see where tessdata directory is located; For me, it happened that my tessdata is located in the following directory==> cd /usr/local/Cellar/tesseract/4.0.0/share/tessdata/

>>cd /usr/local/Cellar/tesseract/4.0.0/share/tessdata/ After doing that, I downloaded the Japanese tessdata from github fro both horizontal and vertical Japanese. After that, I simply moved the files using terminal to within tessdata. >>mv ~/Downloads/jpn.traineddata /usr/local/Cellar/tesseract/4.0.0/share/tessdata/ >>mv ~/Downloads/jpn_vert.traineddata /usr/local/Cellar/tesseract/4.0.0/share/tessdata/

Tesseract Command Through Terminal Working For Japanese Text

Mapping Kanji Text to Database: I downloaded JMDICT, and will use it as my database since it includes KANJI mapped to KANA. I looked at this tutorial to help me set off.

Parsing: The JMDICT database is structured in a way such that each entry contains a structure of elements, which means that if we would want to access the entries and the elements within them, we would have to "parse" data according to this page.

Example of Entry With Its Elements/Children

As a prototype for the code/database combination, I wrote a code where I would randomly select a Japanese entry in the database, and only have the Kanji and Kana readings be printed. I ran into some difficulty breaking up the strings (since xml is just wierd), so my friends Sarbari and Haripriya (course 6) helped explain, and guide me through. Notice how in the picture the Kanji and kana are both printed.

Code in Python for Extracting Kana and Kanji from Database. Kanji lies in "keb" tag, and kana lies in "reb" tag.

Final Code: I was able to successfully extract the kanji and kana elements using beatiful soup. The following code was only run once so that it creates the new database for me. The next step is to use this database and make it an even more useful format using Pandas.

#encoding: utf-8 import codecs import xmltodict import pprint import json import numpy as np from bs4 import BeautifulSoup import xml.etree.ElementTree as ElementTree import xmltodict print ('start opn') f= open('JMdict_e copy.xml', 'r') my_xml= f print ('finish opn') #kanji_kana_pairs = [] soup = BeautifulSoup(''.join(f), features= 'xml') print ('soup done') count = 0 for element in soup.find_all('entry') : try: kanji = element.find('keb').text except: pass kana= element.find('reb').text count += 1 try: print count, kanji,kana if kana and kanji: #kanji_kana_pairs += [(kanji, kana)] with open("my_output_file.txt", "a") as fd: fd.write(kanji.encode('utf-8') + "\t" + kana.encode('utf-8') + "\n") except: pass

Creating Database columns with Pandas: My freind, Haripriya, recommended that I use Pandas for database easy access reasons. I currently congergated my data into a python dictionary, and have this iteration of code running for it to return the Kana reading from a kanji detected by Tesseract.

f= open('my_output_file.txt', "r") data = open('my_output_file.txt') d = {} for line in data: (key, val) = line.split() d[key.decode('utf-8')] = val # convert from string to unicode filename= 'jikan.png' #image with Kanji text im= Image.open(open(filename, 'rb')) #function that opens image text_2= pytesseract.image_to_string(im, lang='jpn') #specify language text_2 = ''.join(text_2.split()) #remove spaces print (text_2) print (d[text_2]) #Kana Reading

Communicaton Between RPI and AVR ATTINY44 The idea is to have a button that controls the camera, and an LCD attached to the AVR ATTINY44 microcontroller I'll make. The camera is connected to RPI. However, for the camera to turn on, and for text to be displayed, communication is required between the two. I looked at this tutorial to know how to start about.

RPI zero and Camera

By using the camera documentation for the raspberry pi, I put together a code that allows the camera to activate or preview, and takes a picture to be saved on the desktop of the pi. Process was pretty straight forward except that I ran into two problems: first thing is that if you are using a pi zero, it will require a separate camera to pi adaptor. And second, the camera might be out of focus, so to focus it you can use a pair of tweezers to rotate the camera rim.

I was able to successfully connect the raspberry pi camera module to the pi and actually put together a code that would take a picture and save it to desktop. The code is as below: (Note that this code is run on python on the rpi)

import time from time import sleep import picamera with picamera.PiCamera() as camera: camera.start_preview() sleep (5) #number of seconds that preview should be on camera.capture('/home/pi/Desktop/image.jpg') camera.stop_preview()

Tesseract and Pi Camera

The next step is to have tesseract installed on the pi, the same way it was installed on my laptop (including languages data files). I also had to install pyresseract, which I did by simply running the Linux commands for installing that, but before that I had to install some dependencies like python setup tools and the library for jpeg that pillow uses, as well as pillow. I also had to use the command sudo and “no cache dir” because the pi would complain about memory errors!

I was then ready to test tesseract with the pi camera. I simply used my code from earlier but instead of a screenshot image, I used the image taken by the pi camera.

For the best conditions for having a picture processed correctly by tesseract, the image dpi must be 300, and the text should be against a white background without any noise (even the slightest noise may skew results).

Pi camera, rpi and Tesseract working together.

Switching LCD Modules

As you might recall from Output Device week, I ran into so much trouble with establishing serial communication with the LCD that was given to me in lab and debugged a lot with Ben but we couldn't figure out what was going on. So, Ben suggested I buy a new LCD model and establish an I2C communication with it. The Schematic for this board is uploaded on Networking Week. The only thing that should be omitted though is the presence of the voltage regulator which was unnecessary to begin with in this case. The voltage regulator precluded the LED from receiving the 3.3 V it needed and wa thus removed.

New Board Milled for I2C

LCD works with English Text

LCD Displayes Katakana Text (My name in Japanese: Ma-ru-wa)

One thing that had to change in my database by using this LCD module was that this module only supports Katakana, and NOT Hiragana. In addition, it understands Katakana when it is in a binary sort of mode. So, I had to change my database such that it maps each kanji entry to this binary mode. For example, 時間　would be \xbc\xde\xb6\xdd. You can see a conversion chart below that I put together. Thanks to Ben for pointing this out to me this conversion when plain Katakana was not printing and I was confused. Also below is an example code of how to communicate through I2C with the LCD module using Arduino. Note that if you are using an attiny, you would have to include a software serial library and declare your serial as a variable and assign the pins corresponding to RX and TX to it, which is (0,1). This is not necessary to do if you use an Atmega, as you can call Serial normally (Atmega is actually very similar to an Arduino). Another important point to mention is that for I2C communication between the LCD and the avr module of choice, you either use a TinyWireM library for attiny, or a Wire library for Arduino or Atmega. //Example Code for attiny #include <TinyWireM.h> //#include <SoftwareSerial.h> #define DISPLAY_ADDRESS1 0x72 //This is the default address of the OpenLCD int SLength; String Word; int i; //SoftwareSerial mySerial(0,1); void setup() { TinyWireM.begin(); //Join the bus as master //By default .begin() will set I2C SCL to Standard Speed mode of 100kHz //TinyWireM.setClock(400000); //Optional - set I2C SCL to High Speed Mode of 400kHz //mySerial.begin(9600); //Start serial communication at 9600 for debug statements //mySerial.println("OpenLCD Example Code"); //Send the reset command to the display - this forces the cursor to return to the beginning of the display TinyWireM.beginTransmission(DISPLAY_ADDRESS1); TinyWireM.write('|'); //Put LCD into setting mode TinyWireM.write('-'); //Send clear display command TinyWireM.endTransmission(); Word= "\xcf\xd9\xdc"; } void loop() { // Serial.print("Cycle: "); //These serial.print statements take multiple miliseconds // Serial.println(cycles); i2cSendValue(); //Send the four characters to the display delay(50); //The maximum update rate of OpenLCD is about 100Hz (10ms). A smaller delay will cause flicker } //Given a number, i2cSendValue chops up an integer into four values and sends them out over I2C void i2cSendValue() { TinyWireM.beginTransmission(DISPLAY_ADDRESS1); // transmit to device #1 TinyWireM.write('|'); //Put LCD into setting mode TinyWireM.write('-'); //Send clear display command SLength = Word.length(); for( i=0; i<SLength; i++) { TinyWireM.send(Word[i]); } TinyWireM.endTransmission(); //Stop I2C transmission }

Chart for conversion between katakana and binary. Character representation starts with \x(number or letter on horizontal)(number or letter on vertical).

Switching from ATTINY to ATMEGA

I ran into some problems with reagrds to memory with the attiny. First of all, the tiny requires at least two libraries to carry out the function that I would like it to do. First, it has to have the software library and then the TinyWireM library. Having these two libraries together is imposible as the tiny would complain that there is no memory, which is why I decided to switch to an ATMEGA and remodeled my circuit (and of course milled a new board accordingly). Below, you can see the schematic, and the PCB design for the current board I use. Note that one trick I learned to use is the zero resistor trick, where if you cannot connect some traces you can try introducing a zero ohm resistor to help create a connection. Also note that the reset on the Programmer pins was later connected by wire to the reset pin on the mega.(Note that there was a PCB iteration prior to this one which only differed in dimensions. The one previewed here has a dimension that fits the design I made for my device, which I will introduce in a bit).

ATMEga Board Schematic

Components for Mega Board

Design and Iteration

As a mechanical engineer, I am always reminded that before designing a product, a prototype using very cheap material should be first executed as a proof of concept. Following this piece of advice, I put together a cardboard iteration of what my design is. Although I wanted to make a pen like model, dimensional constraints from the LCD as well as the rpi camera size, I remodeled my design to fit all of these constraints. The images below provide a better explanation of what the design look like. Afterall, an image is worth a thousand words!

Side View of Cardboard Iteration

Bottomo Side 1 of Cardboard Iteration

Pre-CAD analysis and design sketches.

This design was translated to a first iteration of the 3D model which actually didn't match the dimensions of the design on solidworks. Apparently, the 3D printer provided additional offsets that made the design smaller than what it should be. This was especially true for the holes in my design. So, I measured these offsets and fixed the design on solidworks to match these offsets provided by the startysys.

Offsets and Re-designs on Paper

Back Design of New Design

After settling on the new design, I 3D printed the piece, and started assembling the inner components that I can assemble in a way that still allows for component access in case I need to debug. The wholes for teh camera module are 2 mm, and for the LCD and button 2.5mm diameter.

Attaching LCD Screen

Inside of 3D design

LCD and Pi Camera Side

LCD and Button Side

Setup

Serial Communication: Pi and Atmega

In terms of serial communication, communication happens on two ways: 1) from the mega to the pi. 2) From the pi to the mega. The first portion is that when the button is pressed, a serial message in the form of a 1 byte character (here I chose the character "A") is sent from the TX line on the mega to the RX line on the pi. Once the message is received, the pi launches its camera, and takes a picture. After the picture is taken, the pi processes the picture into text through Tesseract and outputs the katakana reading. The katakana reading is then transported to the mega over serial and displayed on the LCD.

Of course before everything working out as smoothly as discussed earlier, I had to figure out some things. Like which serial port on the pi is responsible for serial communication. The first step was to diable serial on the pi config tab, and then installing picocom to see and interact with the serial port. To see the serial ports you simple use the commans ls /dev/tty since the serial ports start with tty. Apparently for the pi zero, the serial port that worked was the

Letter 'B' on Oscilloscope

Another problem that I ran into was that my serial port was not actually working for the atmega. But a close inspection with the multimeter revealed that it was a solder joint problem (the solder wasn't actually touching the mega's pin). So, I learnt from this experience to always check my pin joints when debugging first. Check for both continuity and correctness of voltage supply. If the multimeter doesn't reveal anything wrong with connections, then use the oscilloscope to see if signals are what are supposed to be received.

A piece of advice for both the future me and current readers: NEVER CONNECT A PROGRAMMER TO A PCB WHILE IT IS CONNECTED TO A RPI. This will cause the pi to shut down and reboot by itself and may harm your electrical connections.

So after I verified that my TX and RX lines were working as anticipated on the scope, it was time to test everything together with the button as well. However, whenever I pressed the button, instead of an 'A' being received as a serial message, wierd blocks were printed. That was very wierd to me because the baud rate for both the mega and pi was specified to be 9600. My speculation was because it could be an ascii character, but Ben checked for me with the scope and apparently the mega was displayig a baudrate of 600! We had no idea why that was happening, so we worked around it by chnaging the baudrate of the pi to 600 rather than 9600, and that fixed this problem.

The Weird Blocks!

The other problem within this setion was with what the LCD was displaying ater the button is pressed and serial communicationa and OCR take place . Ironically, python was not reading the "\x" the way it should be read, and thus the result to the LCD was that the \ was displayed as a yen sign (at least, that's Japan-related).

Yen Sign

To work around this problem, Ben suggested decodeing "hex", so I basically wrote a python function that would iterate over the string and decode "hex", but it would basically only considering the elements after \x as those are the ones that python understands. The result was a success! Watch the video below to check it out! :) [Side note: I compressed the video using ffmpeg using this command line==> ffmpeg -i Desktop/kankan_1.mov -vf scale=400:-2 -vcodec libx264 -crf 20 -an Desktop/kankan_1_edited.mov

Akan Reading

Aesthetics and Design

The first step to design and aesthetics was coming up with a catchy name and designing a mock logo for it and the overall design. I decied to call my product/device "KanKan" since it CAN convert text from Kanji to Kana.I also decided to cover my device with vinyl and vinyl design. The colors came based on what was available in the shop (in this case, yellow and black).

Pre-CAD analysis and design sketches.

Vinyl Parts Cut in Black

Vinyl Parts Cut in Yellow

Attaching First Vinyl Piece!

Yours Truly Attaching Vinyl Pieces to KanKan

All Yellow Vinyl Cut and Attached

My Name in Japanese and English in Black Vinyl

"Forgot to mention, but I also 3D printed a handle, which I attached with Gorilla Glue"

Cleaning up Wiring

Although this step was done before vinyl cutting, I just thought I would have it documented last. So, the pseudo last part was to clean up all the wiring. I simple cut the female to female wires that I had, exposed some of the metallic wire part they have, inserted a shrink tube of an appropriate size, mechanically twisted those exposed parts (for one wire of course) to establish a mecahnical connection, and soldered over it. After that I used a heat gun to shrink the wire tubes.

Wire Exposed and Cut

Exposed Parts Mechanically Twisted and Soldered Together

Heat Shrink Tube is Shrunk!

Wrapping it All Together!

According to the list of tasks that I had to do in the 2 weeks and a half prior Final Presentation Week, I was able to meet all the things on my list (Alhamdullilah!). All that was left is to see KanKan all working together in its packaged state, and it did!!

Final Checklist All Done!

I have also done testing with other kanji examples, and have chosen this one as another example.

Acknowledgments

"Alhamdullilah" first and foremost. Then, many many many thanks to my TA, Ben, for always being supportive and offering me sage advice. I learned a lot from him and aspire to be just like him one day. Also, many many many thanks to my instructor, Zach, for being very supportive and teaching me alot through his invaluable mentorship. I was very fortunate to have had those two as my mentors for this course. I would also like to thank my friend, Haripriya Mehta, for being a great support system and offering me advice on how to refine my data.

I would also like to thank Professor Neil Gershenfeld for making all of this happen. I learned a lot from his lectures and hopefully, one day, I can also inspire young makers the same way he has inspired me! :)

Thank you Everyone for Following My Journey with KanKan. Hoping to Make Greater Products and Inventions in the Future!