Issues with people not being
able to reach the site, except through proxys or anonymizers
-
199912291515-0700
-
The techref and piclist.com sites were down from sometime after 199912291515
to 199912300700 first time since installation. M$ aint so bad!
-
200002022200-0630
-
Yeah, the damn thing screwed itself good last night.... Second time since
we started. Blue screen by this morning and a lot of people said they couldn't
get in. And the DNS server (a third party) had a problem yesterday. Sorry
about the hassle. Its up again now.
The good news is PacBell finally got our 384kbps DSL connection up at a location
where the server can be monitored 24/7. The current server only has human
companionship <GRIN> 06:30 to 15:30 daily. We are testing the connection
for reliability and setting up a new server machine. Then we will register
a nicer domain name for the techref and move the techref and piclist sites.
The old server will continue for other things and refer techref and piclist
people to the new address.
I'm also experimenting with Linux (Red Hat 6.1) as a router and backup web
server. I've seen some interesting things done where an NT box and a Linux
box ping each other and if one dies, the other takes over. The idea is that
even if something external (hacker, virus, anything that triggers an OS flaw,
etc...) kills one, its very unlikely to kill the other. Since I use a lot
of ASP and 32-bit machine language MASM code, in my case the Linux box will
only have a "We are experiencing technical difficulties" web page and it
will spend its time screaming (pager, audio alarm, phone calls, etc...) for
help and maybe rebooting the NT server. So far the Red Hat installer has
some kind of security problem with the partition table, but I've not had
time to really trouble shoot it.
-
200003232200-0616
-
Locked up at 10 o'clock. No idea why. There is a deframent daemon that starts
at that time on Wednesdays. I'll defragment manually and see if that has
a problem.
-
200109280530-0930
-
ISP service outage. Road construction nearby cut the cable. First time in
years.
-
20010917-now
-
All last week I've been having horrible problems with the server getting
overloaded. % Processor Usage goes to 100% and stays. Web site response is
more and more sluggish and the number of users connected climbs to insane
levels. The TaskList shows that inetinfo or mtx is takeing 99% of the available
cycles. Useing Performance Monitor to watch the threads of those processes
shows that different threads are maxing out at different times. Running
windbg, attaching the process and viewing
a stack dump for the maxed out threads always shows that MSVCRT.DLL is running.
No info in MSKB related to that DLL other than that it is the Visual C runtime
library. I don't run any Visual C code. Tried shutting off all CGI-BIN exe's.
Upgradeing that DLL, etc... has no effect. Stopping the site or the entire
web service has no effect (!). And then... We notice that stopping the
content index service frees up the processor and disconnects most of the
connected users. No content index corrupt messages... server ran fine
for an entire day with the search functions disabled. Rebuilding the content
index had no effect. I've disabled the search functions (but kept the index
server running) and the processor never makes it above 5%! And I've served
more pages than ever! Doing one or two searches at a time (locally)
sends the processor up to about 80 or 90% for the duration of the search.
Doing 10 searches at once pegs the meter and it never recovers.
So... the secret is: Don't run a busy NT web server with more than a Gig
of text in the search engine unless you throttle its use. I'm frantically
searching for a way to do that. For now, it just records the starting time
of each search and will not allow another one for a number of seconds after
that.
A new, faster, server would allow more so I'm adding requests for support
on all the search pages and during the time the search is running. If that
doesn't generate some funds for a faster server, I'll look for sponsership
or advertizing for the search pages ONLY! The main site will
always be free and accessable.
-
200112272010-200112280935
-
ISP service outage. Road construction nearby cut the cable. Second time
(for the same reason) in years.
-
20020110
-
Moved from Rancho Bernardo to Temecula. DSL is the only option. Verizon has
us on a 384k up and 384k down modem.
-
20020112
-
Second day and the DSL connection slowed to a crawl. Support at Verizon will
not talk to us unless we diconnect our router and connect only one PC. After
verifying that the connection is at about 64k, a real tech was called and
found that our account was configured incorrectly in one of the three systems
that control connections.
-
20020125
-
A lineman decided that our wires were not punched down neatly enough at the
main terminal block for the building, called up the main number, got a girl
in the office and told her (not asked) that the phones would be down for
2 minutes. He then ripped us out and re-wired us... DSL, fax lines, and all.
Took about 10 minutes.
-
20020226
-
Down for a while this am at the office: First it looked like the router might
have locked up as I couldn't reach it on the internal network... but the
DSL modem was also flashing its Data light which according to the manual
is abnormal. I power cycled both and all was well... for a while. Then
the data light went out and the modem light on the DSL modem started blinking
and that lasted for about 10 min and 2 power cycles. Sat on hold for 7 min
for Tech support and they couldn't get the modem back up so they put in a
trouble ticket, and about a minute later, it came back up. Tech got here
just before closeing and it had worked all day...
-
20020321
-
Up and down this afternoon from 2 to 3 at the office. DSL modem would flash
its data or modem lights on and off. Support doesn't want to send a tech
because it started working while I was on the phone with them.
-
20020322
-
Up and down this afternoon from 12 to 1 at the office. DSL modem would flash
its data or modem lights on and off. DSL sucks. Support will send a tech.
The tech found a bad connection on the breakout panel in the back of our
suite. Seems ok for now.
-
200301061324-200301061509
-
After a long time of not haveing any problems, I apparently set the firewall
box too close to a server monitor and it overheated... I know that internet
access stopped suddenly at 1:24pm, and I could not access the firewall config
page. I found the top of the unit was very hot and the vents were being blocked
by the side of the monitor. After verifying that the problem was not with
the DSL modem, I reset the firewall to factory defaults and started setting
it up again (note to self, print out and keep handy all the settings... Doh!).
Finally back up at 3:09pm
-
20030517-2003051909
-
The SQL server took a dump. BSOD. Restarted and all is well. Only affect
at this point is the email archive, but as I add more ecom stuff, that could
cause real problems. Anytime you add more machines to a solution, the risk
of failure increases. Sigh.
-
20030722
-
Re-arrangeing some network cables and kicked the power cord out of the back
of the server not once, but twice. Doh!
-
20040213092344
-
I applied some updates from $MS last night, and restarted, and then about
11pm there was a big windmill, then a spike in the index service cpu usage
(not unusual) and the inetinfo service started sucking all the available
cycles. Commited bytes slowly climbed to about double the norm and avaiable
bytes oscillated wildly. Network utilization was also spiking with the windmills.
This AM the server was reporting 500 Internal Server error on piclist, sxlist
and massmind and running the others really slow. By the time I got to the
office (9am) the system was still sluggish, but task manager did not show
any unusual activity. The web sites were now reporting HTTP 1.1 Application
Restarting. I restarted the machine and all seems to be well.
-
2004 08/12 19-2004 08/13 10
-
Power spikes from a local lightning storm apparently whacked a chunk out
of the hard drive. No major damage, NTFS was up to the challenge.
-
2004 09/08 pm-2004 09/10 am
-
Well, the NTFS can only do so much I guess. <grin> The hard drive fried
last night. I took it as a sign to move to the new server a bit early. I've
been getting it ready but there are some rough edges... let me know if you
find one.
I had good backups from the old server, but it took a while to get them all
together. I made the mistake of using the new server as a development platform
to try to work out some new features and solve old issue, but that ment that
it was not in sync with the old server. And once i got that sorted out, there
were the standard wierd problems to work though:
-
SEARCH DISABLED: I get runtime error '800a01ad' and can't create object
"ixsso.Query"
Stopping and reStarting the index service makes no difference
ixsso.dll is present v 5.00.1696.1
dependancy walker says it is having a problem with query.dll over the URLencode
export.
Query.dll is present and other exports from it seem ok, its version is
5.00.1782.4
Trying to re-install IIS4 from option pack 4.0. The original query.dll was
5.00.1696.1 which looks like a better match...
Arrgh... I can't copy it in since the original query.dll is in use. How do
I do that?
ahhh! from winnetmag.com article 14585:
Start the registry editor (regedt32.exe not regedit.exe)
Move to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager
Double click on PendingFileRenameOperations (or create of type mutli_str if it does not exist)
On the first line is the name of the file that will be replacing the current file with \??\ in front, e.g.
\??\d:\time\ntfs.sys
On the second line is the file to replaced with !\??\ in front, e.g.
!\??\d:\winnt\system32\drivers\ntfs.sys
Click OK
Ok, that is much better!
-
Performance monitor can't see the web service.
-
It seem to be ok now and the new server is cranking nicely.
-
2005 02/12
-
"HTTP 1.1 Application Restarting" started about 23:00 and continued even
after the regular restart a few hours later. I finally drove in to the office
at around 14:00 on the 13th. I started and stoped the site, tried to stop
the web service (not responding) and restarted the PC. Still the same. And
then I realized that I had been editing an include file used by default.asp
from another PC via the network and had left that file open. I close it and
suddenly all was well. Apparently, IIS was trying to rebuild the application
and couldn't get access to that file because it was locked by the other PC?
Just my guess. Note to self: Don't leave important files open!
-
2005 08/06-07
-
Another of the lovely power fail / shutdown started / power back on before
shut down complete / system doesn't restart hang-ups. APC with NT is just
lovely.
-
2006 07/12-14
-
Another of the lovely power fail / shutdown started / power back on before
shut down complete / system doesn't restart hang-ups. APC with NT is just
lovely. Happens every year when the heat waves come and the power grid can't
keep up with the Air Conditioners.
-
2006 08/11-12
-
Started getting this "Couldn't load application" error on the web site. Event
log is full of
DCOM got error "Logon failure: unknown user name or bad password. " and was
unable to logon MASSMIND\IWAM_XX in order to run the server:
{3DFEFADE-B61A-4096-90D8-F3C0F137D9BB}
and
The server failed to load application '/LM/W3SVC/2/Root/xxxx'. The error
was '80080005'.
where xxxx was each of the applications in turn. E.g. techref, dict, and
so on.
I deleted and re-created each of the applications for the virtual directories
that have them. That seemed to solve the problem. No freaking idea what caused
it except that I had been playing around with the old web server and I think
it syncronized the SAM database and changed the password on that account.
-
2006 08/25
-
None of the web services are running after the normal restart this AM. And
they won't start from the Internet Service Control Panel. The WWW service
is running, so I stoped it and started it and then the web sites, etc...
could be started in the ISCP. Event log shows: Event ID: 4098 Source: Transaction
Server
The run-time environment has detected the absence of a critical resource
and has caused the process that hosted it to terminate. HRESULT: 80070006
(Microsoft Transaction Server Internals Information: File:
x:\viper\src\runtime\mtxex\vipthrd.cpp, Line: 862)
I can't seem to find anything about that on the net... Nothing wierd in the
log files, just seem that something didnt start right this AM. I hate crap
like this. Yesterday and the day before I was seeing really high counts on
"Current Anonymous User" without a corresponding increase in traffic.
-
2006 08/28
-
Happened to notice that the site was running sort of slow and the processor
seemed to be working harder than usual. After looking around, I found that
none of the applications where running in "separate memory space (isolated
process)". Oh... no problem, just some leftover from the issues of a day
or so ago, so I'll just turn that on... And then the web site was down.
"Application failed to load"
Event log is filling up with 80004005 "failed to load"
Oh boy, oh boy... turn that back off... Ok for now. But why can't I run in
seperate memory space? Anyway, long story short:
http://support.microsoft.com/kb/297989/
Was the deal. After I got that worked out, all was well. Not only the IUSR
but also the IWAM accounts had gotten thier passwords changed by the backup
domain controller and they needed to be reset. I also ended up having to
delete two of the packages out of the Transaction Server \ Computers \ My
Computer \ Packages Installed list and re-create them via the Internet Service
Manager. Not sure what was wrong with them, but they would not accecpt the
seperate memory space option: it would just delete the application when I
turned that on.
Fun, fun, fun.
-
2007 08/04
-
Perminant "Application Restarting" again. As far as I can tell, the backup
(which runs from another server) froze with the application .asa file open
(open for read not write, but IIS seems to have issues with anyone else even
looking at that tile) and restarting both machines did the trick.
Looking back, it appears August is a bad month for my web server...
-
2007 08/16
-
Power failure. UPS held the server up for about 20 minutes and then it died.
The power came back on about an hour later... for about 2 minutes... Then
went down for another hour and a half or so. Finally came back on around
5 and stayed up. August sucks. Of of these years, I have to move to a hosted
server.
-
2008 02/21
-
Hard drive lost some chunks and for some reason the server didn't restart
on it's own, so it didn't start working on the standard restart chkdsk until
I restarted it in the morning around 9am. And the chkdsk ran REALLY slow
for some reason (probably because it was finding and correcting the problems)
so it didn't get done until around 3pm. Yeap: Fried the entire day. And me
biting my nails to see if the drive was toast. Once it finally finished,
everything reports good and the server is back up.
-
2008 07/17
-
Power failure. The triplight rack mount 750 kept the server up for about 10 minutes, but the monitering software on the servers was setup to shut down after a few minutes and totally failed to do so. I happened to be in the office and so I shut down manually after discovering that the monitoring agent service was not running and could not be started. It had apparently failed after an IMF update had been installed earlier that day. So why did it fail? And why didn't the monitoring agent (which was running) log or broadcast any error message when it's service wasn't running? Details will be posted after TrippLight support had a chance to respond.
Interested:
Comments:
-
-
-
-
grooveee@optushome.com.au
just a thaught for you, if the search engine is the hard part the simplest
answer is to let sombody else do it ;-> say perhaps google? they have
a search within a site type thing (eg the microchip site) also what search
algorithm is used here? i have some skill with ASP i'll help if I can. (thaught
that springs to mind is dump the MS indexing stuff and put the search stuff
into a MySQL database, that should be *plenty* fast.
-
James
Newton replies: Thanks for your suggestions: Yes, you can use google
to search the parts of the site that they have bothered to index. Unfortunately,
that doesn't seem to cover much.
The algorithm is proprietary to Microsoft and inside the search engine so
I have no idea what it is.
The one huge advantage of the MS Index engine is that it gets a notification
of any changes from the OS itself... e.g. it does not have to actually index
the site to know what has changed. This alone is a massive advantage. Since
the site is now approaching 3 GB of content and is constantly growing (about
200 new documents or changes per day) and the server is a very slow old Celeron,
my efforts now are, I think correctly, directed to hardware upgrades... The
new server has arrived, and I just need to get the OS installed on it and
move the site over.
Thanks again for your offer to help...
-
nomel@techie.com
"
www.nortonantivirus.com I think its a virus... Try norton, update to
most current virus defenitions... "
James answers: Thanks for the input Nomel, but I have the latest Norton on
the machine. And I found the problem was in the Content Index as seen above.
I was supprized that others haven't run into this... And I do appreciate
your concern.
Questions:
-
asks:
I'm sure you have a lot of time invested in the current search engine but
have you given any thought to maybe porting it to a *NIX type system? In
my experience *NIX systems scale much better then NT not to mention handel
load better. Using a very simple shared SCSI system with OpenGFS (Global
File System) you could add more servers with ease. I want to make it perfectly
clear I'm not bashing the current system it is just that even (especially)
you seem understand that something must be done. Would you be open to the
idea of other list members giving their constructive input on how the system
could be upgraded? --adam
James Newton answers:
I did try with Linux (Red Hat) but I was spending all my free time
(of which there isn't much) either A) learning how to patch / hack / compile
/ etc.. the OS or B) keeping the script kiddies out. I asked for volunteers
to help, and I even offered to pay or turn the use of the box over to another
in return for hosting my site... I had no takers until just last week and
then only to help a friend solve an email problem. I want to spend my time
concentrating on the content and MS NT has been very, very good to me on
that point. My uptime on this little box has been quite good.
As far as I can see; With *nix, you pay for the people or time and with NT
you pay for the OS.
Says:
" Thats what you get for using Linux. :) Try a
real server OS, like FreeBSD (or any of the BSDs for that matter). "
Andrew
Wilson Says:
What are the specs on the box you're serving with? What's
bandwidth usage for the piclist server? Have you ever explored the possibility
of mirroring the site and round-robin..ing the dns? I'd like to help if I
can, I think piclist is a great resource (if only it were easier to access
the info..) Feel free to respond here or via email at wil1(..at..)umbc.edu
James
Newton replies: This is pretty normal volumn: It seems to grow at
a rate of about 50k per month on average.
Hits Bytes Visits PViews Date
29,061 65,586,744 3,720 13,217 21 Mar
46,723 105,378,142 5,733 21,479 22 Mar
42,573 95,996,292 5,941 18,481 23 Mar
50,140 113,094,810 6,040 22,739 24 Mar
43,310 97,862,740 5,846 21,123 25 Mar
Hits Bytes Visits PViews Month
1,918,843 4,438,804,622 131,238 638,653 Jan 2004
1,078,755 2,434,471,520 119,662 490,111 Feb 2004
1,353,173 3,052,974,442 129,288 600,971 Mar 2004
Hits Bytes Visits PViews Month Unique IPs
2,515,640 28.56_GB 168,732 782,610 Mar 2007 116,454
2,140,602 24.33_GB 153,541 689,888 Apr 2007 106,116
2,302,367 25.23_GB 163,775 747,945 May 2007 112,178
The server is an NT box which is a bit starved for ram at the moment: 96MB
only. But I'll be upgrading it again soon.
The only problem with mirroring is that the site accepts updates instantly
via the form you used to post your offer. Keeping two sites in sync is something
I haven't worked out, but I'll take any help I can get on that issue.
Thanks!
Ivan
Kocher Says: " If you need help with linux I can help.
I do so for a living and electronics is my second.
drop me a line, to see how this can be done :)
Ivan "
James
Newton replies: Thank you kindly. Frankly, I have neither the time
nor the desire to learn Linux. With the searches limited to one every few
seconds, the server does just fine.
If you were interested in mirroring the site or providing a search engine
from your own server, I would be willing to try to set up such a thing.
At one time I was looking in to using RSYNC (I think?) on the NT box to send
out notice of changes and keep a remote mirror up to date. The idea was that
the remote mirror could provide the search function (as well as the content
if desired) and so remove the burden from the local box.
Again, I'm NOT interested in hosting, learning, or in other ways touching
*nix. Nothing personal, just my preference based on my lone experience.
Thanks again.
See also:
Every now and again, I find "RUNDLL32 SETUPAPI,InstallHinfSection DefaultInstall
1 {out}.inf" running in an unattached process. It turns out that is part
of a batch file that I wrote years ago to restart the server every night.
The batch file gets called by task manager and trys to run dll this inf file
that causes the server to restart. If it doesn't restart, for what ever reason,
the process sits there doing nothing forever.