Hi Morgan and welcome back, First, I am not that familiar with the 18F (yet) but I have been through = similar cases with other micros. 1. Try to reduce the code more and more until the problem goes away and the= n = add things back to see where the problem is. 2. Mostly when I have had these kind of problems, it has been the growing s= tack = that has overwritten static structures and variables in memory. This can be = kind of fun since the problem appears to be somewhere quite different than = it = really is. 3. Uninitialized variables that mostly have the right value to start with b= ut = sometimes, depending on prior use of memory, not. (This may be something th= at = differs between debug and release since debug might initialize allocated me= mory = but release won't or not to the same value.) 4. Can the code be simulated? Does the simulator show the same symptoms? 5. Does it show up in both debug and release? (does this even exist on 18F = compilers?) 6. If all else fails, a realtime trace would show you exactly what happens.= A = real emulator could perhaps be borrowd or rented for shorter times. I know,= it = costs a lot and takes time to learn how to handle, but it is invaluable in = these kind of situations. 7. Could it be hardware related? Does your code perhaps manipulate hardware = that could be dangerous to your micro, ie a momentary state that would draw= a = lot of current causing a short surge on the power supply? No shielding woul= d = help here. Are all pins on the micro operating within absolute maximum rati= ngs? = No current into protection diodes? 8. Overflow in intermediate variables? 9. Static buildup on moving parts, with tiny discharges causing hickups? (I = think I have this in a propeller clock circuit.) 10. Also agree with Harold that you should make sure that all functions hav= e = the expected signature everywhere - return type, parameters types, paramete= r = passing methods, big/little endian use... Good luck /Ruben = > Hi > = > We are having severe trouble with PIC18 programmed in C. > = > Is there anybody who have experienced anything odd, like functions = > returning wrong value, if statements evaluating erroneously, suspected = > nonprovoked jumps.. *occasionally* while everything works OK during most = > executions of the same parts of code, then after a random time - bang!?? > = > This is driving us nuts, me (hardware designer and assembler guy), and my = > collegue who does all programming, in C (I myself is not very C litterate) > = > To start from the beginning: > We have designed two cirquit boards, one is measurement and user I/O slav= e = > based on PIC16F883 and a couple precision A/D and other I/O, and the main = > control board use PIC18F66J15. > = > Both are programmed in C using CCS PCWH4.x compiler. > = > The PIC16 board works perfectly, so it seems we are not too stupid ;) > = > The PIC18 on the other hand works mostly, but once in some million = > structions or so it does some bad thing. > We can not explain what is the cause and whe have wasted weeks now on thi= s it > and is over the intended deadline already... Simply, when we loaded the f= irst > program version we was happy to see it worked at all; bidirectional comms > timesharing power on one line, also serial communication to a VFC motor = driver, > and some logic between. But.. occasionally it just goes wrong. > = > For example we caught this behaviour using ICD2 and extra code to sample = > variables: > We shut of interrupts, then execute a part of program that calls a = > function, the function always evaluates correctly but *occasionally* the = > returned value is *not the evaluated one* - the function theoretically = > even cannot reuturn that value we got at the reveiving end! > AND THIS IS WITH INTERRUPTS SHUT OFF!!!! > Also, we could not find an error in the generated assembly code. > At that point, we thought it must be EMI or bad hardware. > = > But... we eliminated all hardware and EMI problems; > = > = > o We tested to move the PCB away from the VFC, put it in a metal box, = > shielded the processor with copper foil, supply by batteries instead of = > the switchmode converter, toriods on cables. Also tried cooling it far = > below freezing point, and warmed it hot using a hair dryer. Also varied = > CPU voltage (core and I/O) to max and min. All this had ZERO impact on = > error rate. > = > o We also scoped the VDDCORE (which also is VDD) to be perfectly fine. = > Added extra capacitors of other types and even daming R-C just in case. = > Still no difference. > = > o We changed to another PIC individual - same behaviour > = > o Changed the design to use the similar PIC18F6722 (higher voltage) on = > another cirquit board: same problems (plus more, this chip have a bigger = > errata...) > = > o Also we asked Microchip support if any execution bugs are known in thi= s = > chip, answer is no. > = > So we rule out hardware problems. > = > My collegue have found and corrected some own errors in the source code, = > but still the basic problem i have described is not found. > = > But we also cannot understand how that problem expressing itself that = > randomly can be related to our own code. > -Or the compiler for that matter. > So if we rule out PIC chip, surrounding hardware, compiler and source = > code, what is left? > Nothing? > Still this stupid behavoiur!! AAAAAH. > = > = > We have analysed parts of what the compiler have generated. > Some parts are smart, some very clumsy, but not really wrong. > We changed between a few 4.x compiler verisons and also ported it to 3.x = > compiler as a lot of users still prefer that and call 4.x still to be in = > beta. Still we have about the same problem. > = > The problem seem to wander as we insert debug code. > We even have seen simple if statements go wrong!! occasionally. > The error mostly happens when system operation mode is changing, when = > there are a lot of variables changing - but sometimes it just sit and = > change operation mode by itself... > = > It seems like there is some ghost throwing a dice and rewrites some = > register randomly, and/or cause the program to jump and/or return to wron= g = > adress after call. > = > Even with interrupts shut of we have observed "something" hitting us. > Also the most spooky thing is that in one code setup PortB interrupt fire= d = > always after a timer interrupt althoug we could not find in hardware or = code > why it ever wold do that. > = > We ponder switching to C18 compiler. Maybe the problem is not the CCS = > compiler, but the rewrite might make us find an source code problem, plus= C18 > supports REAL-ICE for better debugging. But it is time consuming to port = from > CCS C, we are at the deadline already, so a direct fix would be much bet= ter. > = > = > The erratas we have found are > http://ww1.microchip.com/downloads/en/DeviceDoc/80246b.pdf > http://ww1.microchip.com/downloads/en/DeviceDoc/80315a.pdf > It was not easy to find both. Maybe we missed more erratas? > = > Our main thread on CCS forum on this: > http://www.ccsinfo.com/forum/viewtopic.php?t=3D31672 > "zilog" is my programming collegue on this project. > = > We got a lot of help there, but nothing that found the problem. > Remembering the wisdom here on PICLIST i now turn to you ;) > = > -- = > Morgan Olsson > = > -- = > http://www.piclist.com PIC/SX FAQ & list archive > View/change your membership options at > http://mailman.mit.edu/mailman/listinfo/piclist > = > = > = > -- = > No virus found in this incoming message. > Checked by AVG Free Edition. = > Version: 7.5.484 / Virus Database: 269.12.9/975 - Release Date: 2007-08-2= 6 21:34 > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D Ruben J=F6nsson AB Liros Electronic Box 9124, 200 39 Malm=F6, Sweden TEL INT +46 40142078 FAX INT +46 40947388 ruben@pp.sbbs.se =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D -- = http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist