Previously I’ve gotten as far as getting the main console more or less working and UNICOS checking in. After cleaning up some garbage on the screen due to some misunderstanding of how the sequence number and acknowledge system worked for the terminal, I have a much cleaner picture now:
Looking at the screen, there are two ominous warnings: one that the system apparently ‘cannot open root with disk inode’ and the other reporting the file-system to be full. I’ll let the first one slide for now as, apparently the system still boots and look at the second problem: why is the file-system full and what to do about it?
Turning on logging of disk activity and looking at what’s going on, it’s pretty easy to identify the culprit: The OS tries to write the full kernel memory content onto the disk (sort of like a core-dump). But why would it dump the core? If the kernel actually crashed, it wouldn’t have gotten as far as it did, and if it wasn’t the kernel, the dump wouldn’t be of the full memory only of the processes’ memory space.
Maybe that’s what just UNICOS does: it dumps the memory on boot for fun and giggles.
However that possibility brings up the next problem: the file-system I have (ram_fs) clearly isn’t big enough to hold the dump.
(Update: As I much later found out, UNICOS only dumps the kernel memory space if it’s booting off of a IOS-attached disk as opposed to a ram_fs. In this case, however I’ve emulated a virtual hard drive, containing the ram_fs image, which confused the OS. Nevertheless, I didn’t figure out much later how to even boot off of a RAM drive, let alone that that was the problem.)
Whatever the reason is, the solution seems to be to figure out how to re-size an existing file-system. This might turn out to be hard to do, but I have to reverse-engineer the file-system anyway: at the moment I can only exchange files with the OS running in the simulator through creating a FS on the host and mounting it inside the simulator. And exchanging files I must if I want to install the full OS: I need to transfer the install media.
So, how should I go about it?
Let’s go back to our trusted source of information: /usr/include/sys. There’s actually a whole directory here dealing with file-system stuff:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
$ ls -la total 165 drwxrwx---+ 1 tantos None 0 Jun 18 07:11 . drwxrwx---+ 1 tantos None 0 Jun 18 07:11 .. -rwxrwx---+ 1 tantos None 472 Jun 18 07:11 nc1dir.h -rwxrwx---+ 1 tantos None 10251 Jun 18 07:11 nc1filsys.h -rwxrwx---+ 1 tantos None 7709 Jun 18 07:11 nc1ino.h -rwxrwx---+ 1 tantos None 16110 Jun 18 07:11 nc1inode.h -rwxrwx---+ 1 tantos None 11884 Jun 18 07:11 nc1proto.h -rwxrwx---+ 1 tantos None 1625 Jun 18 07:11 ncdir.h drwxrwx---+ 1 tantos None 0 Jun 18 07:11 nfs -rwxrwx---+ 1 tantos None 7001 Jun 18 07:11 prfcntl.h -rwxrwx---+ 1 tantos None 1313 Jun 18 07:11 prmount.h -rwxrwx---+ 1 tantos None 2307 Jun 18 07:11 prnode.h -rwxrwx---+ 1 tantos None 985 Jun 18 07:11 sfsblock.h -rwxrwx---+ 1 tantos None 1034 Jun 18 07:11 sfsconsts.h |
Of primary interest are nc1filsys.h nc1ino.h and ncdir.h. They contain (in order) the layout of the super-block, something called the dynamic block, the inode structure and the directory structure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
/* USMID @(#)uts/c1/sys/fs/nc1filsys.h 100.1 04/16/98 12:47:39 */ /* COPYRIGHT CRAY RESEARCH, INC. * UNPUBLISHED -- ALL RIGHTS RESERVED UNDER * THE COPYRIGHT LAWS OF THE UNITED STATES. */ #ifndef __C1SYS_FS_NC1FILSYS_H_ #define __C1SYS_FS_NC1FILSYS_H_ /* * NC1 File system structures and definitions */ /* * Inode region descriptor. * The first block of an inode region is a * bit map for the inodes in that region. */ struct nc1ireg_sb { uint i_unused:16, /* reserved */ i_nblk :16, /* number of blocks */ i_sblk :32; /* start block number */ }; struct nc1ireg_db { uint i_avail; /* number of available inodes */ }; #define NC1MAXIREG 4 /* Maximum inode regions per partition */ #define NC1IMAPBLKS 1 /* number of blocks in inode map */ struct nc1fdev_sb { long fd_name; /* Physical device name */ uint fd_sblk :32, /* Start block number */ fd_nblk :32; /* Number of blocks */ struct nc1ireg_sb fd_ireg[NC1MAXIREG]; /* Inode regions */ }; struct nc1fdev_db { int fd_flag; /* flag word */ struct nc1ireg_db fd_ireg[NC1MAXIREG]; /* Inode regions */ }; #define FDNC1_DOWN 1 /* Slice is not available */ #define FDNC1_RDONLY 2 /* Slice is read only */ #define FDNC1_NOALLOC 4 /* Slice is not available for allocation */ #define FDNC1_SBDB 010 /* Slice has valid FS tables */ #define FDNC1_RTDIR 020 /* Slice has valid ROOT Inode and directory */ #define FDNC1_SECALL 0100 /* Slice sector allocated */ #define NC1MAXPART 64 /* Maximum number of partitions */ /* * Structure of the super-block */ struct nc1filsys { long s_magic; /* magic number to indicate file system type */ char s_fname[8]; /* file system name */ char s_fpack[8]; /* file system pack name */ dev_t s_dev; /* major/minor device, for verification */ daddr_t s_fsize; /* size in blocks of entire volume */ int s_isize; /* Number of total inodes */ long s_bigfile; /* number of bytes at which a file is big */ long s_bigunit; /* minimum number of blocks allocated for big files */ long s_secure; /* security: secure FS label */ int s_maxlvl; /* security: maximum security level */ int s_minlvl; /* security: minimum security level */ long s_valcmp; /* security: valid security compartments */ time_t s_time; /* last super block update */ blkno_t s_dboff; /* Dynamic block number */ ino_t s_root; /* root inode */ int s_error; /* Type of file system error detected */ blkno_t s_mapoff; /* Start map block number */ int s_mapblks; /* Last map block number */ int s_nscpys; /* Number of copies of s.b per partition */ int s_npart; /* Number of partitions */ int s_ifract; /* Ratio of inodes to blocks */ extent_t s_sfs; /* SFS only blocks */ long s_flag; /* Flag word */ struct nc1fdev_sb s_part[NC1MAXPART]; /* Partition descriptors */ int s_iounit; /* Physical block size */ long s_numiresblks; /* number of inode reservation blocks */ /* per region (currently 1) */ /* 0 = 1*(AU) words, n = (n+1)*(AU) words */ long s_priparts; /* bitmap of primary partitions */ long s_priblock; /* block size of primary partition(s) */ /* 0 = 1*512 words, n = (n+1)*512 words */ long s_prinblks; /* number of 512 wds blocks in primary */ long s_secparts; /* bitmap of secondary partitions */ long s_secblock; /* block size of secondary partition(s) */ /* 0 = 1*512 words, n = (n+1)*512 words */ long s_secnblks; /* number of 512 wds blocks in secondary */ long s_sbdbparts; /* bitmap of partitions with file system data */ /* including super blocks, dynamic block */ /* and free block bitmaps (only primary */ /* partitions may contain these) */ long s_rootdparts; /* bitmap of partitions with root directory */ /* (only primary partitions) */ long s_nudparts; /* bitmap of no-user-data partitions */ /* (only primary partitions) */ long s_nsema; /* SFS: # fs semaphores to allocate */ long s_priactive; /* bitmap of primary partitions which contain */ /* active (up to date) dynamic blocks and */ /* free block bitmaps. All bits set indicate */ /* that all primary partitions are active, */ /* and no kernel manipulation of active flag */ /* is allowed. */ int s_sfs_arbiterid;/* SFS Arbiter ID */ long s_fill[91]; /* reserved */ }; #define NC1NSUPER 10 /* Copies of s.b. per partition */ #define NC1MINPARTSZ (6+NC1NSUPER) /* Minimum blocks per partition */ #define NC1MAXACTIVEPARTS 4 /* Max. number of primary partitions */ /* maintained as up to date */ #define FsMAGIC_NC1 0x6e6331667331636e /* s_magic number */ #define FsSECURE 0xcd076d1771d670cd /* s_secure: secure file system */ /* * Filesystem errors */ #define Fs_SUPER 1 /* Bad super block encountered */ #define Fs_DYNAMIC 2 /* Bad dynamic block encountered */ #define Fs_SHARED 3 /* Bad shared block encountered */ #define Fs_MAP 4 /* Bad map block encountered */ #define Fs_SFS_SYSDOWN 5 /* SFS System Active sema cleared */ /* * Filesystem super block flags * (For mostly historical reasons nearly all of the flags defined here, * and later for dynamic block usage, are mutually exclusive. * This practice dates back to before the time when the super block * and dynamic block were separated.) * Many of the bit combinations that aren't apparently `available' have * probably been moved to m_fsflag in the mount table. * As these flags get separated into different data structures, the * need &/or desire to retain their old bit notations rapidly decreases. * The flags that remain in the super block, or dynamic block, must stay * in the current form, as these flags are carried on-media, and become * a matter of file system compatibility. */ #define Fs_PANIC 0000000001 /* not used */ #define Fs_RRFILE 0000000002 /* Round robin file allocation */ #define Fs_RRALLDIR 0000000004 /* Round robin all directories */ #define Fs_RR1STDIR 0000000010 /* Round robin 1st level directories */ /* Fs_CHECKED 0000000040 Flag used in the Dynamic Blk */ /* Fs_MOUNTED 0000000100 Flag used in the Dynamic Blk */ #define Fs_UPDATE 0000001000 /* File system update in progress */ #define Fs_WUPDAT 0000002000 /* File system wakeup after update */ #define Fs_RRALLUDATA 0000020000 /* Round robin all user file data */ #define Fs_NOIPREF 0000040000 /* Inode alloc. preference disabled */ #define Fs_PANICLESS 0000100000 /* Attempt to continue on error */ #define Fs_SCRUB 0000200000 /* Enable/Disable filesystem scrub */ #define Fs_SFS 0010000000 /* Shared File system */ #define Fs_TESTCOND1 0100000000 /* Test condition #1 */ #define Fs_TESTCOND2 0200000000 /* Test condition #2 */ #define Fs_TESTCOND3 0400000000 /* Test condition #3 */ struct nc1dblock { long db_magic; /* magic number to indicate file system type */ daddr_t db_tfree; /* total available blocks */ int db_ifree; /* total free inodes */ int db_ninode; /* total allocated inodes */ long db_state; /* file system state */ time_t db_time; /* last dynamic block update */ long db_type; /* type of new file system */ int db_spart; /* Partition from which system mounted */ int db_ifptr; /* Inode allocation pointer */ int db_actype; /* device accounting type (for billing) */ long db_flag; /* Flag word */ long db_res1[10]; /* reserved */ struct nc1fdev_db db_part[NC1MAXPART]; /* Partition descriptors */ lockinfo_t db_lockinf; /* proc of the process locking the filesystem */ int db_dpfptr; /* primary partitions allocation pointer */ int db_dsfptr; /* secondary partitions allocation pointer */ daddr_t db_sfree; /* secondary parts free blocks */ union { int db_fpmapfil[16]; struct map db_fpm; /* Free blk map hdr - primary part. */ } db_fpmap_u; union { int db_fsmapfil[16]; struct map db_fsm; /* Free blk map hdr - secondary part. */ } db_fsmap_u; long db_fill[133]; /* reserved */ }; #define db_fpmap db_fpmap_u.db_fpm #define db_fsmap db_fsmap_u.db_fsm #define db_fptr db_ifptr #define db_fmap db_fpmap #define DbMAGIC_NC1 0x6e6331646231636e /* db_magic number */ /* * Filesystem dynamic block flags */ #define Fs_CHECKED 0000000040 /* File system checked */ #define Fs_MOUNTED 0000000100 /* File system mounted */ /* * Macros that result an a pointer to a file system Mount structure */ #define VFS_TO_MP(vfsp) \ ((struct mount *)((vfsp)->vfs_data)) #define VP_TO_MP(vp) \ ((struct mount *)(((vp)->v_vfsp)->vfs_data)) /* * Macros that result an a pointer to a file system Super Block */ #define MP_TO_NC1SB(mp) \ ((struct nc1filsys *)(((mp)->m_bufp)->b_waddr)) #define VFS_TO_NC1SB(vfsp) \ ((struct nc1filsys *)((VFS_TO_MP(vfsp)->m_bufp)->b_waddr)) #define VP_TO_NC1SB(vp) \ ((struct nc1filsys *)((VP_TO_MP(vp)->m_bufp)->b_waddr)) /* * Macros that result an a pointer to a file system Dynamic Block */ #define MP_TO_NC1DB(mp) \ ((struct nc1dblock *)(((mp)->m_dbufp)->b_waddr)) #define VFS_TO_NC1DB(vfsp) \ ((struct nc1dblock *)((VFS_TO_MP(vfsp)->m_dbufp)->b_waddr)) #define VP_TO_NC1DB(vp) \ ((struct nc1dblock *)((VP_TO_MP(vp)->m_dbufp)->b_waddr)) /* * Macros that result an a pointer to a file system SFS Control Block */ #define MP_TO_NC1SFSB(mp) \ ((struct sfsdblk *)(((mp)->m_sfsbufp)->b_waddr)) #define VFS_TO_NC1SFSB(vfsp) \ ((struct sfsdblk *)((VFS_TO_MP(vfsp)->m_sfsbufp)->b_waddr)) #define VP_TO_NC1SFSB(vp) \ ((struct sfsdblk *)((VP_TO_MP(vp)->m_sfsbufp)->b_waddr)) #ifndef KERNEL /* * Bit-position flags for fsgetsuper() flags argument. */ #define FSGETSUPER_USEFIRST 0001 #define FSGETSUPER_NOERRORS 0002 struct mntent; int ismounted(char *special, ...); int issfscapable(int *port_num, char **error_string); int sfsgetpathnames(int arbiter_id, char *smpname, char *sfsname, char *mntname); int sfsgetarbiterid(char *arbiter_name); int fsgetsuper(int sfd, char *fname, struct nc1filsys *sb, struct nc1dblock *db, uint flags, int *ret_iou); int get_fs_sema(struct nc1filsys *sb); int print_shared_mount_table(char *arbiter_list); int sfsaddmntent(struct mntent *newmnt, struct nc1filsys *sb); int sfschkmntent(struct mntent *newmnt, struct nc1filsys *sb); int sfsdeletemntent(struct mntent *mnt_to_delete, int port); #endif /* KERNEL */ #endif /* __C1SYS_FS_NC1FILSYS_H_ */ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
/* USMID @(#)uts/c1/sys/fs/nc1ino.h 100.1 04/16/98 12:47:40 */ /* COPYRIGHT CRAY RESEARCH, INC. * UNPUBLISHED -- ALL RIGHTS RESERVED UNDER * THE COPYRIGHT LAWS OF THE UNITED STATES. */ #ifndef __C1SYS_FS_NC1INO_H_ #define __C1SYS_FS_NC1INO_H_ #include <sys/fs/sfsconsts.h> /* * Typedefs used by the filesystem dependent code */ typedef struct { uint iaf : 1; uint : 15; dev_t dev : 16; blkno_t blk : 32; } dblk_t; typedef struct { int nblks : 32; blkno_t blk : 32; } extent_t; /* * On-disk inode structure as it appears on the NC1FS & NC2FS filesystems */ struct cdinode { uint cdi_rsrvd_1 : 8, /* Reserved for expansion of cdi_mode */ cdi_mode :24, /* mode and type of file (4-bits still free)*/ cdi_msref : 1, /* Modification signature is referenced flag*/ cdi_ms :14, /* Modification signature */ cdi_nlink :17; /* #of links to file (can hold > 100,000) */ uint cdi_rsrvd_2 : 8, /* Reserved for expansion of cdi_uid */ cdi_uid :24, /* Owner's user-ID */ cdi_rsrvd_3 : 8, /* Reserved for expansion of cdi_gid */ cdi_gid :24; /* Owner's group-ID */ uint cdi_rsrvd_4 : 8, /* Reserved for expansion of cdi_acid */ cdi_acid :24, /* Account-ID */ cdi_gen :32; /* Inode generation number */ long cdi_size; /* Number of bytes in the file */ long cdi_moffset; /* Modification offset for current signature*/ uint cdi_blocks :52, /* Quotas: #of blocks actually allocated */ cdi_extcomp : 1, /* Security: extended compartments flag */ cdi_secrsvd1:11; /* Security: reserved */ union { long smallcmps; /* Compartments if [0..63] */ } cdi_compart; /* Security: compartments info */ uint cdi_slevel : 8, /* Security: security level */ cdi_intcls : 8, /* Security: integrity class */ cdi_secflg :16, /* Security: flag settings */ cdi_intcat :32; /* Security: integrity category */ union { daddr_t daddr; /* Extent descriptor */ dblk_t dblk; /* Block descriptor */ } cdi_privs; /* Privilege Assignment List location */ union { daddr_t daddr; /* Extent descriptor */ dblk_t dblk; /* Block descriptor */ } cdi_acl; /* Security: ACL location */ uint cdi_cpart : 8, /* Next partition from cbits to use */ cdi_dmport : 3, /* DMF daemon number */ cdi_dmstate : 5, /* DMF file state */ cdi_dmkey :48; /* Data-Migration: key */ uint cdi_allocf : 4, /* Data-Block allocation flags */ cdi_alloc : 4, /* Data-Block allocation technique */ cdi_cblks :24, /* Number of blocks to allocate per part */ cdi_dmmid :32; /* Data-Migration: machine-ID */ uint cdi_atmsec :34, /* Access time (secs) */ cdi_uatmsec :30; /* Access time (microsecs) */ uint cdi_mtmsec :34, /* Modification time (secs) */ cdi_umtmsec :30; /* Modification time (microsecs) */ uint cdi_ctmsec :34, /* Time of last inode modification (secs) */ cdi_uctmsec :30; /* Time of last inode modification (microsecs)*/ long cdi_cbits; /* bit mask, file placement within cluster */ union { daddr_t daddr; /* Extent descriptor */ dblk_t dblk; /* Block descriptor */ long whole; struct { uint one :32, /* half 1 */ two :32; /* half 2 */ } half; struct { uint one :16, /* quarter 1 */ two :16, /* quarter 2 */ three :16, /* quarter 3 */ four :16; /* quarter 4 */ } quarter; struct { uint one : 8, /* eighth 1 */ two : 8, /* eighth 2 */ three : 8, /* eighth 3 */ four : 8, /* eighth 4 */ five : 8, /* eighth 5 */ six : 8, /* eighth 6 */ seven : 8, /* eighth 7 */ eight : 8; /* eighth 8 */ } eighth; } cdi_addr[8]; /* File allocation locators */ /* The #define for NC1NADDR must not be > 8 */ long cdi_slock[SFSLK_SZ]; /* Reserved for SFS lock structure */ uint cdi_rsrvd_5 : 16, /* Reserved for Kernel group for expansion */ cdi_applac : 32, /* Application accounting tag */ cdi_nindir : 16; /* # of indirect extent blocks */ long cdi_rsrvd; /* Reserved by the Kernel group for use in */ /* future releases of UNICOS. */ /* No notification will be given when these */ /* words will be employed by future versions */ /* of UNICOS. */ long cdi_sitebits; /* Word reserved for site use. */ }; /* NOTE: Reserved fields in the cdinode structure will be preserved * when an inode is updated. They will be cleared when an inode is * newly allocated. */ #define NC1INOPB 16 /* * struct cdinode cdi_addr defines * * For IFCHR & IFBLK devices, the cdi_addr words are used to hold * special security and configuration information, as well as * the device's rdev field. */ #define cdi_rdev cdi_addr[0].half.one /* IFCHR or IFBLK rdev */ /* cdi_addr[0].half.two Reserved */ /* Security fields related to devices */ #define cdi_minlvl cdi_addr[1].eighth.one /* minimum level */ #define cdi_maxlvl cdi_addr[1].eighth.two /* maximum level */ /* cdi_addr[1].quarter.two Reserved */ /* cdi_addr[1].half.two Reserved */ #define cdi_valcmp cdi_addr[2].whole /* valid compartments */ /* cdi_addr[3].whole Reserved */ /* Configuration related device parameters */ #define cdi_param0 cdi_addr[4].half.one #define cdi_param1 cdi_addr[4].half.two #define cdi_param2 cdi_addr[5].half.one #define cdi_param3 cdi_addr[5].half.two #define cdi_param4 cdi_addr[6].half.one #define cdi_param5 cdi_addr[6].half.two #define cdi_param6 cdi_addr[7].half.one #define cdi_param7 cdi_addr[7].half.two #define cdi_filename cdi_addr[5].whole /* Filename of logical */ /* * The i_number in the NC1FS is made up of three elements, * the partition number, the inode region number and the * inode region relative inode number. The maximum size of * each of these fields was selected for potential expansion * at a later date. The current composition of the i_number * is: * 32/ 0, 8/ Partition, 4/ Iregion, 20/relative inum */ #define nc1ino_rinum(x) (int) ((x)&((1<<20)-1)) #define nc1ino_ireg(x) (int) (((x)>>20)&017) #define nc1ino_part(x) (int) (((x)>>24)&0377) #define makenc1ino(p,r,i) (ino_t)((long) (p)<<24 | (long) (r)<<20 | (i)) /* * NC1FS i_number to disk block (inode region relative) and offset */ #define nc1itodf(fp,i) (fp->s_part[nc1ino_part(i)].fd_ireg[nc1ino_ireg(i)]\ .i_sblk+nc1itod(i)+nc1imapblks(fp)) #define nc1itod(i) (nc1ino_rinum(i) / NC1INOPB) #define nc1itoo(i) ((nc1ino_rinum(i) % NC1INOPB) * sizeof(struct cdinode)) #define nc1itos(i, iou) ( (iou) > 1 ? (nc1itod(i) / (iou)) : nc1itod(i) ) #define nc1imapblks(fp) ((fp->s_priblock+1)*(fp->s_numiresblks+1)) /* * Allocation types used in cdi_alloc. */ #define C1_EXTENT 1 /* Cray-1, X/YMP extent-based allocation */ #define C2_TRACK 2 /* Cray-2 sector/track block-style allocation */ /* * Allocation flags used in cdi_allocf. */ #define CDI_ALF_NOGRW 001 /* Allocation is not allowed to grow */ #define CDI_ALF_PARTR 002 /* Allocation of partition type only */ #define CDI_ALF_RES1 004 /* unused */ #define CDI_ALF_RES2 010 /* unused */ struct nc1bmap_pos { struct inode *nc1p; /* inode address */ daddr_t indblk; /* indirect block number (or 0) */ int exntni; /* extent index of current pos */ int curblk; /* logical block number not */ /* including blocks at current pos */ }; #endif /* __C1SYS_FS_NC1INO_H_ */ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
/* USMID @(#)uts/include/sys/fs/ncdir.h 100.0 07/11/97 02:46:40 */ /* COPYRIGHT CRAY RESEARCH, INC. * UNPUBLISHED -- ALL RIGHTS RESERVED UNDER * THE COPYRIGHT LAWS OF THE UNITED STATES. */ #ifndef __NCDIR_H_ #define __NCDIR_H_ /* * Directory structure for Cray file systems using Berkeley-style format: * * The directory structure for these file systems is modeled after * the Berkeley directory structure. However, unlike the Berkeley * implementation, the name in a directory entry is NOT null * terminated. */ #define CDMAXNAMELEN 255 /* Maximum length of file name */ #define CDIRBLKSIZ BSIZE /* Directory fragment size; */ /* WARNING: Must be a power of 2 */ /* * CDIRSIZ() is a macro that given a pointer to an allocated directory * entry, returns the smallest possible size (in bytes) for the directory * entry. Any difference in the value returned by this macro and the actual * size of the directory entry record is free space within the directory * chunk. */ #define CDIRSIZ(dp) \ (sizeof(struct cdirect) - ((CDMAXNAMELEN + NBPW-1) & ~(NBPW-1)) \ + ((dp)->cd_namelen + NBPW-1) & ~(NBPW-1)) struct cdirect { unsigned long cd_ino; /* Inode for name */ unsigned long cd_sino; /* Reserved for future use */ unsigned short cd_reserved:10, /* Reserved for future use */ cd_signature:22, /* Name signature */ cd_reclen:22, /* Record length (bytes) */ cd_namelen:10; /* Length of name (bytes); */ /* MUST = 0 if cd_ino = 0! */ unsigned char cd_name[CDMAXNAMELEN]; /* NON-null terminated name */ }; #endif /* __NCDIR_H_ */ |
OK, that was a mouthful. There are many things to tease out here, so let’s start!
The structure of the file-system is based on the traditional UNIX approach, but there are a few key differences. The whole file-system thinks of the disk as a series of 4kByte blocks. These blocks conveniently map to sectors on the hard drives used in the J-90, but that’s not necessarily a requirement. A file-system on the machine could be spread around on multiple ‘partitions’ on multiple drives and supported various striping configurations, though those details are not terribly important for a SW simulator. The mapping of sector-ranges to partitions and file-systems is part of the parameter file, though some of this information is duplicated in the file-system itself. (In other words there’s no partition table on the hard drives.)
Blocks are numbered consecutevly, starting at 0 through all the partitions that constitue a file-system, at least I think so: I’ve seen sections of code that seem to iterate through all the partitions and doing a subtraction of what appears to be the partition size to determine the physical sector corresponding to a logical block.
The most important information about the file-system is stored in the super-block. The primary copy of it is at block 1 with several copies sprinkled around the drive. This structure contains all the (more or less) static information about the file-system. The frequently changing info (like last mount time, locking, number of free inodes, etc.) are factored out into the dynamic block (nc1dblock). The dynamic block is also one block large and it’s location is recorded in the super-block.
These two together describe the file-system layout but not the content. For that, we’ll need a set of inodes and something, called the FREEMAP. Each inode describe one entity (the content of a file or a directory) on the file-system. It is the key structure from which the blocks containing the content can be accessed. An inode entry is 256 bytes long, so 16 of them fits in a block. A set of blocks are set aside when the file-system is created for inode storage. These regions are described in the super-block for each partition that constitutes the file-system.
Given an inode number, it’s block offset can be determined by dividing the number by 16. This block offset than can be used in an iteration through the inode allocation regions in the super-block to convert it to an absolute block number. Within the block, the modulo 16 of the inode number (multiplied by 256) provides the offset of the struct.
The root inode number is two in all UNIX system. Unicos apparently has the feature of changing that default (there’s a field for that in the super-block), but I decided to not mess with it.
Contrary to the original UNIX file-system design, there’s no free inode list. Instead, a bitmap is stored on the hard-drive, which records the state of each block on the file-system: 0 for free, 1 for occupied. This structure is called the FREEMAP, and it’s location and size is recorded in the super-block (s_mapoff and s_mapblks fields).
Theoretically this information is not strictly necessary: one can iterate through all inodes, record all the allocated blocks, and what’s not allocated, is – by definition – free. This is a length process though, so understandably the OS caches the result. The fsck utility among other things checks the and fixes any inconsistencies between the inodes and the FREEMAP.
Inodes
As we’ve discussed, UNIX – pretty much all flavors of it – represents the content of every file (or directory) with an inode. The inode structure contains the list of blocks corresponding to the file. This structure is rather hairy, but the main use-case is fairly easy to grasp: the allocations for the file are held in an array of 8 entries: cdi_addr. Each allocation is a contiguous extent of sectors, so each entry has a start block and a block-count part. I’m sure for highly fragmented file-systems, indirect inodes also exist (when the 8 entries in the inode are insufficient to describe the whole file) but I didn’t bother figuring out how that works: due to the extent-based allocation, it’s pretty difficult to set up a scenario when 8 entries are insufficient. It certainly won’t be a problem for a FS created from scratch on the host.
Inodes also contain the access permissions and time-stamps for creation, modification etc. These details are not terribly important or interesting for the moment. The only thing to note is that UNICOS on top of supporting the traditional UNIX-style permissions, has a whole new and different permission system. If it is enabled by default, I’m in trouble – I’ll have to figure out what the related fields mean. However, there’s no indication that’s default on.
Directories
Inodes only capable of describing the content of something. To make the FS useful, we need to give a name to these content ‘blobs’ and organize them. This is what directories achieve: associate a file name with it’s content, that is, an inode.
So how are directories stored? Of course in an inode! While for normal files, the content of blocks the inode references is ‘just a bunch of bytes’ as far as the OS is concerned, for directories, the format is defined: it is a set of cdirect entries. These entries are not much more than a mapping between a name and an inode (which then describes the content), with one important exception: there’s a field, called cd_signature. After some debugging I realized that this field is a hash of sorts of the file-name. But what kind? There are so many to chose from? The only way to figure that out was to look at the instruction traces for the kernel trying to access a directory entry on the hard drive. From that work, the following algorithm emerged:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
uint32_t CalcHash(const char *aFileName) { size_t Length = strlen(aFileName); std::vector<uint64_t> Buffer((Length + 7) / 8); std::fill(Buffer.begin(), Buffer.end(), 0); memcpy(&(Buffer[0]), aFileName, Length); uint64_t XHash = 0; for (auto &Word : Buffer) { XHash ^= SwapBytes(Word); } uint64_t XHash2 = XHash + (XHash >> 7) + (XHash >> 17) + (XHash >> 27) + (XHash >> 37) + (XHash >> 47); XHash2 = XHash2 & 0x3fffff; return uint32_t(XHash2); } |
The code is a bit hacky, but does the job. The memcpy is needed to make sure that the file-name is zero-padded to 64-bit boundaries, and the SwapBytes call is there to rectify the endianness differences between the host (x86) and the target (Cray).
There are several other details of course that I haven’t figured out, but this is enough to implement a very basic file-system manipulation utility: one that can create a passable virtual hard drive, with a single partition on it, that contains a single file-system. The utility can also create files in the root directory of that file-system and copy their content from files on the host OS.
This utility created a functional – albeit one-way – communication channel between the host PC and the simulated target. It’s imperfect but good enough for the job. It could be extend to be more complete, potentially even to the point where the host can mount Cray FS (NC1FS) volumes, but that’s a lot of work for not much value. It would be way more intesting to bring networking up but that I’ll save for a later post.
Back to the top
So where were we? The original problem I wanted to solve was that the file-system gets full with the OS trying to create a memory dump on a FS that’s clearly not large enough to hold one. So, armed with all this knowledge about the FS strucutre, what can we do?
Interestingly the size of the FS is really only stored in a few places: the s_fsize member of the super-block, the fd_nblk field of the partition descriptors and the size of the FREEMAP (bmp_total field). Changing the first two fields is not a big deal, but changing the size of the FREEMAP is problematic: it can’t really easily grow beyond the size of the block(s) it occupies. Luckily a single block (4kByte) worth of bitmap, which is the smallest allocation unit, supports disks up to 128MBytes in size, a significant extension over the 48MBytes of the intial RAM FS. So really, all it takes is patching up two or three fields to resize the parition to 128MBytes, wihch provides enough room for creating the dump and still leaving some extra space. Problem solved!
Are we done?
Yes, yes we are. I’ll stop this rather boring wall of text here. The next one, I promise, will be much more interesting.