Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corruption in the date portion of log files #40

Open
thinrope opened this issue May 10, 2016 · 3 comments
Open

corruption in the date portion of log files #40

thinrope opened this issue May 10, 2016 · 3 comments

Comments

@thinrope
Copy link
Member

Looking at the raw log files from Nanos, in many places there is the invalid date string of "2000-00-00T" ...
About 244K lines of 40.5M or 0.6% is corrupt.
It is present in a variety of firmware versions (numbers are number of on-off log segments), showing top 3 of 17 affected versions:

format=1.3.5nano,207
format=1.2.8nano,127
format=1.3.4nano,118

It affects 99 devices, some more than others, those are top 10:

2140,25
2327,20
2004,11
2022,11
2303,11
2431,11
2001,8
1009,7
2320,7
2326,7

I looked at the code, but couldn't spot anything obvious and no literal string 2000. However, there is quite a lot of "magic hackery" with the years, e.g.:

#define DEFAULT_YEAR 2013

year = 2012, month = 0, day = 0, hour = 0, minute = 0, second = 0, hundredths = 0;

And finally
*year += *year > 80 ? 1900 : 2000;

Recent drives with this problem (from May 2016) are 22969,22975,22980 (devices 1207 and 2001). Both run "format=1.3.4nano" firmware. For those 3 drives, this is the number of points on a given date:

1987-02-00  7
1987-05-00  43
1987-28-00  14
2000-00-00  462
2003-00-00  42
2004-00-00  13
2006-00-00  38
2010-00-00  2
2016-03-27  4771
2016-05-01  1001
2016-05-03  415
2016-05-04  3364
2016-05-05  4213
2030-16-00  42
2035-04-00  45
2051-05-00  10
2052-00-00  21
2061-00-00  55
2080-01-05  5
2080-01-06  11
2080-01-10  3

While there is a higher percentage of 2000-00-00 bug, it really smells like memory corruption to me.

@thinrope
Copy link
Member Author

Just looking at the dates, 40325908 lines were extracted, 475066 of them are invalid dates (including before 3/11 and after today), or that gives 1.17% of corrupt data only due to date problems. Looking at the top number of invalid dates, all have "00-00" in the month-day section and those represent 446908 or 94% of the bad dates by number.

So may be not memory corruption after all, but bad logic somewhere :-|

@fakufaku
Copy link
Member

Hi Kalin, part of the explanation (the last magic) is that the GPS year is given on two digits (god know why) that start at 80 (1980). Some of the GPS modules default to 80 before they first acquire a date.
I am guessing that another part of the explanation is that different modules default to different year (just maybe).
For the rest, maybe a memory leak ? I wouldn't be surprised given how packed the firmware is.

@thinrope
Copy link
Member Author

Yes, I know the NMEA mis-design ;-) I also thought something like memory corruption, but seeing that 94% of the cases we end up with 00 in month/day, I am 94% sure it is a logical mistake, possibly not checking error or something, since we initialize those values to 0, before passing by reference. If I had to debug that, I'd initialize them to 99 or 33 instead and look to catch such value, as well as 00.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants