SystemAdministration
TeamPages - parent, Development
Welcome
This is a comphensive index dealing with aspects of system administration and management of our clusters, as well as some of the more archine bits of setup required to make it work.
Who we are
Sysop Team Main Page | ||
---|---|---|
nick |
position |
timezone |
paulej72 | Co-leader | UTC-4 (EDT) |
mechanicjay | Co-leader | UTC-4 (EST/EDT) |
NCommander | Member | UTC-9 (AKDT) |
Audioguy | Member | UTC-7 (PST/PDT) |
Index of Development Pages and Resources
Servers
List of servers on linode: Category:SystemAdministration/Servers
- soylent-www - Primary Apache and slash servers for main site.
- soylent-db -- mysql servers, holds the slash database.
- dev -- Development server.
- staff-slash -- Staff only Slash server.
- irc -- IRC server and related services.
- backups -- Backup services.
- directory services -- LDAP and Kerberos.
- soylent-services - mail, wiki, other services as needed.
Known Problems
- Need cron job to backup server
- No init script for Apache.
- Broken https configuration
- Mostly fixed, Slash is the problem child now
- Gluster is occassionally misfiring, manifests as Apache or slashd crashing depending on the node, can be fixed with the following command cocktail
sudo umount -l /srv/soylentnews.org # tells linux to lazy unmount, required when glusterd took a dive sudo service glusterfs-server restart sudo mount -a # will remount gluster without an issue Then restart Apache/Slashd as required
Stuff That Needs To Be Addressed
- Hydrogen is off line due to performance problems
- Gluster is unstable on Fluorine and Boron and sometimes Hydrogen
- icinga/monitoring project needs to be picked up and completed
- Landscape - I appreciate NCommanders ability to obtain a product normally sold for free, but if this is not being used it should not be running and using resources. (helium, perhaps others)
- Should have some sort of SN password safe
- Privilege Duplication - making sure that all services have multiple admins
- DNS, Audioguy is investigating some goofiness
- Systems Documentation needs to revised and brought up to date.
- Work Coordination, not always good communication when fundamental things change.
- There is no firewall coding at all. Something I normally set up before even one network cable is plugged in. I understand the desire to have an open system on public interfaces, but I see no reason that systems not publicly accessible such as database backends should not be firewalled off from Chinese and other such hackers. Such as the attempts being made on Helium from 120.192.20.162 120.192.0.0/11 China Mobile communications corporation at the moment I am writing this.
Work Notes
DNS is completely run and managed by Linode's DNS Manager service. This was an expedient decision when trying to get off bluehost. We may want to investigate putting the master zone file on helium or boron and having external services handle serving out our dns.
Resources
- Access Instructions - how to get on the nodes, get around, and kerberos for users primer
- Group Permissions - understanding our LDAP groups, what machines they can access, and where you can sudo
- LDAP Management for Dummies - how to do basic shit in that source of miserary known as LDAP
- The Rise And Fall Of New Node Management - from bash to fully intergrated node, this doc has it
- The Hitchhikerr's Guide to The li694-22 Domain - machine list, general information
- Kerberos Administration Or Everything You Wanted To Know About Kerberos But Were Afraid To Ask
- DnsRecords - copy of the zone file pasted from the machine that was accidentally turned off
- Emergency Technical Procedures - in case of fire, break glass (Read before messing with servers)
- Backup Information - information on where and what is being backed up.