The problems of parsing shitty PGN files

PGN is a terrible format. Chess notations are also terrible. Everything is easier for a computer than parsing chess notations! Seriously, this stuff is so non-trivial that you need to read tokens from game start to game end, then analyse it the same direction but interpret it backwards cause all optional informations come BEFORE the necessary ones. How fucking stupid must one butt head be so invent such A FUCKING SILLY SHIT. Goddamnit, I invested so much time into it and now everything's useless cause there's anoher stupid rule that is possible. I HATE CHEEEESSSSSSSSSSSSSSSSSSSSS. I don't even know where to continue from here. Problem is that I know there are exception in the files I need to parse, so I can't just write something that assumes everything's written in the correct format. I even started to not work on it anymore - fatal if it's a mandatory component for your project. I wasn't even able to sleep this night due to this problem. I just don't know how to solve it with less work. I'm desperate and confused, nothing is going right in here. I WILL kill the guy who invented it if I'll ever meet him.

It's such a useless junk format. The problem is the move notation which lists all rounds and turns in the game including comments, annotations, alternate moves and the game end itself. I got a parser for a single turn done, but it only works if you already have the full turn cause it reads it backwards (wasn't able to figure out a way to read from start to end due to this ridiculous order of optional fields). However, the order in which stuff can appear is:

  • round number and '.'
  • white's turn
  • optional whitespace
  • optional comment or alternate turn followed by an optional repetition of the current round number with one or three '.' after it (optional whitespace between all these)
  • black's turn
  • optional whitespace
  • optional comment or alternate turn followed by an optional repetition of the current round number with one or three '.' after it (optional whitespace between all these)
  • optional whitespace
  • repetition of this list or one the following tokens indicating the game end: "*", "1-0", "0-1" or "1/2-1/2"
Also, you can add annotation and comments anywhere between whitespaces I think. I don't know.

It's just ridiculous to parse, especially with all these whitespace inbetween. Maybe it's just me having worked on it too much. Maybe I can't just see the solution or so. Maybe I should take a break, but I don't have the time for that. There are only a few weeks til presentation and I need to get this done... And then there's the problem that it's driving me sleepless. A circle of doom, nothing more. Maybe I should check again if the format is more strict so that I can create a more simple structure to parse. It would be easier if it weren't such an implicit structure with so many optional parts in it.

However, I think I should put it away a day or two until I have clear head and a quiet place to work on it. I guess I'm just not getting anything at the moment... It's not my time. Yes, let's do something else to relax. I can't really sleep at the moment, so I'll probably just put stuff into my head to give maximum flower power or something like that.

No comments: