Use some pretrained neural nets to decode Mnist characters
You can use a cut down version of Mnist with only 10,000 characters in it. This can help while you learn as the runs will execute much more quickly and memory errors will be much less likely.
You can also choose to create a classifier or a predictor.
A classifier believes that the answer and the truth are discrete variables and is rewarded for getting the answer right. It uses the visual similarity to achieve that
A predictor believes the answer and the truth are continuous varables. It's rewarded for predicting an answer that is numerically close to the truth. It uses the visual similarity to achieve that.